What can 311 noise complaints in Gowanus tell us about gentrification?

Using Python and NYC Open Data to identify patterns in my favorite Brooklyn neighborhood

Sarah Schoengold
Towards Data Science

--

A view of of the Gowanus Canal taken in 2010, the year it was given Superfund Site status. Photo Credit

Gowanus is a Brooklyn neighborhood tucked between Park Slope and Carrol Gardens along the Gowanus Canal, an industrial waterway now infamous for its “black mayonnaise,” the term for its polluted false bottom. Making up just over 5% of Brooklyn’s population, the neighborhood lay relatively quiet after the thoroughfare become largely obsolete due to an incompatibility with containerization in the mid 20th century. In the past 15 years, it’s undergone drastic change, in part due to a residential rezoning in 2003 and Superfund Site designation in 2010. Today, Gowanus is rapidly gentrifying.

I lived in Gowanus before the Whole Foods and before the Kentile Floors sign came down. The changes in the built environment since I left the neighborhood in 2013 have been quite striking. In addition to construction introducing new housing stock along the water, the neighborhood is investing in green space, making efforts to improve livability. With so much change, I’m interested in how the neighborhood has voiced their concerns over time.

Picture of the Gowanus Canal from the 3rd St Bridge in 2012 (left) features the back of the Kentile Floors sign. Today, the sign is gone and the bridge view showcases high-end advertisements (right).

NYC 311 is the non-emergency call system that allows citizens to report issues in their neighborhood. Since its origin, it’s been a type of quick snapshot of a region’s pulse. Specifically, noise complaints have been used as one (imperfect) proxy to help understand gentrification. It speaks to the increase of people on the street, but also to those who use the call line to report noise — those who are potentially less familiar with the baseline sounds of the neighborhood.

By looking at noise complaints in 311 data from 2010–2017, I hope to identify how — if at all — complaints in Gowanus deviate from Brooklyn’s overall distribution. What can this data-driven story tell us about when the neighborhood changed, and what does the trend mean for the future of my favorite Brooklyn neighborhood?

The data

The 311 dataset is available on OpenDataNYC from 2010 to present. The dataset is massive, so I used the api to filter it down to Brooklyn calls only. The noise complaints are divided into four types: 1) General; 2) Residential; 3) Commercial; 4) Street/Sidewalk.

After using Python to filter out other complaint types, I was left with 664,116 calls over the years to work with for all of Brooklyn. The dataset brooklyn.head() looked something like the following:

Isolating Gowanus

Identifying calls that took place in Gowanus required some data munging. This was a good opportunity to play with the Geopandas and Shapely Python packages, something that I’d first encountered in my Urban Informatics class at NYU (see here a [slightly messy] assignment that questions the equity of LinkNYC Wifi hub distribution using these packages as well).

First, I combined the latitude and longitude provided for each row, and converted them to the appropriate geometry. This step is crucial because it introduces a spatial meaning to the data. These lat/longs are not just numbers; they have an associated geographic projection.

I won’t include all the code for this project here (for that, check out my Github) but below are a few key snippets in case you’d like to do something similar:

# creating a column which combines latitude and longitude
brooklyn['lonlat'] = zip(brooklyn['Longitude'],brooklyn['Latitude'])
# creating a geometry column using shapely
# this says: "these aren't numbers; they have a spatial definition"
brooklyn['geometry'] = brooklyn[['lonlat']].applymap(lambda x:shapely.geometry.Point(x))
# assigning geometry, crs, and converting into a geodataframe
geometry = brooklyn.geometry
crs = {'init': 'epsg:4326'}
brooklyn = GeoDataFrame(brooklyn, crs=crs, geometry=geometry)

Next, I used Google Maps to eyeball latitude and longitude coordinates for what I defined as the Gowanus neighborhood. I used Shapely to create a Polygon geometry for these points, shown in the map below. From there, I used a for loop with the contains() function to ask if each of the recorded calls' location fell within the area I'd identified as Gowanus, which resulted in a boolean array.

I used Google Maps to eyeball my Gowanus boundary. From there I used python to create a new dataframe with calls that were contained within the polygon using the Geopandas and Shapely packages.

I added the array as a new column to my Brooklyn database called “is_gowanus” — picture a column of all “True” or “False,” designating if each call’s lat/long falls within the Polygon. Using that column, I created a new dataframe called “Gowanus,” where: gowanus = brooklyn[brooklyn['is_gowanus'] == True]

At the end of this spatial analysis exercise, my Gowanus dataframe had 7,262 rows, compared to the Brooklyn dataframe which remained at 664,116 rows.

Exploratory analysis

Now that we had a dataframe for each group — all of Brooklyn and Gowanus— we can do a little digging. First, I grouped data by year and complaint type in order to visualize the data over time (this required some timeseries gymnastics in python). The plots below show how Brooklyn and Gowanus complaints have evolved.

These plots show the distribution of 311 noise complaint types from 2010 to 2017. While the volume of calls follows roughly the same trend, the noise types appear to follow a different distribution.

The first stacked bar plot shows the steady increase of noise complaints in all of Brooklyn via 311, from 2010 to 2017 to date. Notice that overall, “Residential” noise has the most complaints consistently. The second stacked bar plot shows the steady increase of noise complaints in the Gowanus area over the same period of time. Instead of Residential noise, the Gowanus area has mostly Commercial noise complaints, shown in gray. The amount of calls seems to expand to be the majority of all noise complaints in 2016 especially, a pattern not observed in Brooklyn overall.

These two plots show the same general trend of increasing calls overall throughout the years, but do deviate from each other in terms of the types of noise complaints coming in. This could be due largely to the zoning designation of the area, but is worth investigating further.

Testing for Stationarity

The next part of my analysis used stationarity tests to look at the noise complaint patterns in Gowanus. If data is stationary, then it has the same statistical properties — like mean and variance — over time. If data is not stationary, it’s often due to what’s called a unit root, a part of the data that’s unpredictable.

Using rolling means to visually assess stationarity

First, because there are relatively few months worth of data (fewer than 100), it may be possible to view trends visually. I created a count of noise complaint calls per month since 2010, grouping by month and year. The following plots show the data over time, with a rolling mean window of 10.

These plots show total noise complaints by month. Notice that there is some periodicity in Brooklyn (right) that doesn’t seem to be mirrored in the Gowanus plot (left).

These plots show the volume of complaints over time, by moth, in Gowanus and Brooklyn respectively from 2010 to Dec 2017. Notice that the general trends of the rolling means follow roughly the same pattern, steadily increasing as time goes on. The Brooklyn plot appears to have more regular seasonality, while the Gowanus plot has a few interesting spikes in spring/summer of 2014. In 2017, there’s another spike in the winter that doesn’t seem to be reflected in the overall Brooklyn plot.

It wasn’t entirely clear visually if the overall trends differed after plotting the timeseries data. To dig further required normalizing the Gowanus call data, and using stationarity statistical tests to determine if Gowanus is in fact stationary, or if it has a unit root.

AD Fuller Tests for Stationarity

First, I made the Gowanus data a ratio. Instead of using the total number of noise complaints, I looked at the percentage of noise complaints from Gownaus, compared to the rest of Brooklyn. You’ll notice that the plot seems pretty steady over time, with a few potential points of change in 2016 and 2017. Using this normalized data, we can really get to the core of our question about 311 noise complaint behavior.

This plot shows the ratio of noise complaints in Gowanus to all of complaints in Brooklyn over time.

I chose to use an AD Fuller test to assess stationarity. The AD Fuller test’s null hypothesis is that there is a unit root, meaning there is not stationarity. If we can reject the null hypothesis, it generally means the data is stationary. I used the Statsmodels package in python to implement the test.

The one-line test resulted in a p-value of 0.00132, which means we can reject the null at the appropriate confidence interval. This test’s hypothesis is a weird double negative, but what this means is that although there are some visual changes, this data is stationary overall. When looking at just the past three years, the p-value remained close to zero, also denoting stationarity.

Commercial Noise Complaints

Seeing the results of the stationarity test was a bit surprising, but it may not tell the full story. One larger deviation from our exploratory analysis showed that there were large deviations in noise complaint subdivisions. To look at these in particular, I duplicated the analysis, looking at only “Commercial — Noise” complaints only.

This plot shows the percentage of commercial noise complaints in Gowanus over time. Notice the irregularity in the past three years.

This plot looks at the percentage of commercial noise complaints over time in the Gowanus area. Unlike all noise complaints over time, this plot show what appears to be major deviation in the past few years including distinct, irregular spike in 2016 and 2017. When conducting the AD Fuller test for the past five years in particular, we get a p-value of 0.0541. Here, we cannot reject the null (just barely), meaning these data are not stationary. Something here is a little different in Gowanus.

Conclusion and limitations

So what can 311 calls in Gowanus tell us about gentrification? Absolutely nothing for sure.

The results of our tests show that while these noise complaint data are stationary overall, they are not stationary for certain complaint types within Gowanus. The top descriptor for “Noise — Commercial” is in the 311 dataset is: “Loud Music/Partying.” This could mean that as the neighborhood gentrified, there have been more loud parties that disturb residents.

However, all this really means for sure is that more people are calling about the loud parties more often in 2016. This brings up an interesting limitation of this study: there is known bias in 311 reporting (see studies that investigate the ratio of reports to actual violations). Although it’s hard to draw concrete conclusions from the data due to its bias, we can still use this data driven anecdote to support the timing of “there goes the neighborhood” claims.

View from Smith Ave F stop looking over the canal towards Manhattan, 2013

Next steps: Periodicity and Points of Change

Next steps on this project would be to investigate the exact point of change for different types of complaints (areas that deviate over 3 sigma from the mean). There also appears to be clear periodicity in the Brooklyn timeseries data which isn’t as obvious in the collection of Gowanus calls. It’s possible that the Gowanus periodicity is there too, but that it’s hidden in the noise since the number of Gowanus calls are fewer (and signal-to-noise ratio goes down as sqrt(N)). A Fourier analysis could reveal if there’s a similar periodicity, or if it’s another area in which the neighborhood deviates from the rest of Brooklyn.

More to come! See the full exploration on Github.

--

--

I’m a product person, with interest in urban design, data science, and digital storytelling.