Noisy Neighbors and Traffic Trouble

An Analysis of NYC 311 Service Requests

Avonlea Fisher
Towards Data Science

--

Photo by Mike C. Valdivia on Unsplash

Introduction

NYC Open Data is a vast trove of City government datasets that have been made available to the public. One such dataset, 311 Service Requests from 2010 to Present, will be the focus of this article. This 311 data is updated daily and contains information about more than 24 million service requests made since 2010. For those who aren’t familiar, 311 is a phone number used in the U.S. that allows callers to access non-emergency municipal services, report problems to government agencies, and request information. This article discusses my process for exploring trends in a recent subset of the data (June-Nov. 2020), and for building a neural network to classify the government agency that responded to a given call. The full code for this project is available on its GitHub repository.

Obtaining the Data

The 311 data was obtained through the Socrata Open Data (SODA) API, which partners with NYC Open Data to host City data. The code below creates a data frame with data for the 1.5 million most recent calls.

This project also uses data from the NYC Department of City Planning’s Community District Profiles. A CSV file containing indicators for all community districts can be downloaded from any district profile. Both the 311 data and the indicators data contain a column for the community district/board. After reformatting the values and column name in the indicators data frame, it could be merged with the 311 data on their mutual “community board” column:

Exploratory Analysis

Using Plotly Express bar charts, I visualized the distribution of calls across different complaint types and government agencies. There were dozens of unique complaint types, so the first plot displays only the top 30 most frequent complaints:

Plot by Author

We can see that complaints related to noise, vehicles/traffic, utilities and trees are the most common, with noise complaints standing out as the clear majority category.

Plot by Author

The agency column is similarly imbalanced: since the NYPD handles noise complaints, it responds to the vast majority of 311 calls. The Department of Housing Preservation and Development (HPD) and the Department of Parks and Recreation (DPR) also respond to a significant proportion of calls.

To explore how the volume of calls received in different complaint categories changed over time, the function below generated daily call frequency dictionaries for a given complaint type:

There was no “day” column in the original data frame, but by converting the “created date” column to a datetime object, a new day column could be generated by applying datetime.date() to each row. This allowed for the creation of area plots that show the change in daily call counts over time. The following plot, for example, shows the daily counts for calls related to COVID-19.

Plot by Author

The non-compliance calls peaked in August, and with the exception of a couple dips, consistently exceeded 200 calls per day. Mass gathering complaints, on the other hand, were not recorded during the summer, and hover just above 100 daily calls at their peak. This does not indicate that no mass gatherings occurred in the summer (as a New Yorker, I can confidently attest to the contrary). It means, simply, that this complaint category was not formally introduced to the data before the fall. Let’s now take a look at noise complaints, which were the most common type:

The number of residential noise complaints surpasses that of other noise categories, and exceeded more than 4000 calls on some days. We would expect most people in their homes to be particularly bothered by excessive noise coming nearby locations; and with all the tiny, packed apartments in this city, it is no surprise that residential noise takes the lead here. Vehicle and sidewalk/street noise complaints are also fairly common. Next, we’ll look at a complaint type with a single peak in call count: damaged trees.

Plot by Author

Damaged tree calls are relatively rare on most days, but on August 4th and the subsequent couple of days, the call counts exceeded 12k. A quick Google search revealed the likely culprit: Tropical Storm Isaias, which not only damaged thousands of trees, but also resulted in mass power outages and took the lives of at least 5 people. This plot is demonstrative of just one form of wide-sweeping, devastating damage that natural disasters can cause.

Using Plotly’s Mapbox Density Heatmap and the location data for the calls, it was possible to create animations that show how call volume varied across the 5 boroughs:

Animation by Author

The animation to the left shows, as an example, what the above plot_calls() function rendered for the month of August. Yellow areas indicate high call volume and appear predominantly in the Bronx, Manhattan, and parts of Brooklyn. Observe how, on the days with thousands of damaged tree complaints, the entire city flares up in yellow.

The scatterplot below uses information from both datasets to show how average daily call volume (per square mile) and the crime rate (per 1000 residents) varies across different community districts. The dictionary for the community district indicators defines the crime rate variable as the “rate of 7 major felonies, as defined by the NYPD, reported in the community district in 2016 per 1,000 residents.” The Chart Studio rendering condenses some of the information within the plot, but a community district’s crime rate and average daily call volume can be displayed by hovering over one of the points. Larger points indicate a higher crime rate.

Plot by Author

Manhattan Community Board 5, located in Midtown, had the highest crime rate at 29.4 per 1000 residents. Two community boards in the Bronx have the highest daily averages for 311 calls, but have lower crime rates than many other districts. Staten Island, compared to other boroughs, appears to have both low crime rates and low average volumes of 311 calls. These observations suggest that, contrary to what our intuitive expectations may be, the rate of crime and the rate of non-emergency violations or problems do not have a clear linear relationship. It’s important to note, furthermore, that the data only reflects crimes and non-emergency issues that have been reported, and consequently does not paint a complete and accurate picture of these variables.

Classifying the Responding Agency

As previously illustrated, the NYPD is responsible for responding to most 311 calls, and some agencies respond to very few. A basic classifier that predicts that every call is responded to by the NYPD would still be accurate just over 50% of the time. In a balanced dataset, conversely, such a classifier would be accurate just 7% of the time. When dealing with imbalanced classes such as this one, it’s important to adjust how we evaluate a model’s accuracy: 50% accuracy in classifying the agency does not indicate a strong performance, despite the large number of classes, because a single class is associated with the majority of calls.

The community districts’ numeric data were not strong predictors of the agency that responded to 311 calls made within them. The calls’ descriptors, unsurprisingly, were far more effective. There were just over 800 unique descriptors, including chin-scratchers like “bingo hall”, and, simply, “monkey.” The word cloud below shows which words appeared most frequently in the descriptor column.

Wordcloud by Author. Mask by Vectorstock on vectorstock.com.

Words that appear larger in this cloud are those that were more common in the descriptors. Given that noise was the most common complaint type category, it makes sense that words like “music,” “party,” and “loud” appear the largest. Words related to parking/vehicles, streets, sidewalks, and trees are also featured prominently in the word cloud.

Calls that received responses from minority agencies were oversampled with the below custom function in preparing the training data. In this function, ‘n_samples_dict’ is a dictionary with the number of samples that would be taken for each agency if the weights of the agencies in the training subset were to match those of the entire dataset.

The function was applied to each agency name, and the weighted samples were concatenated into a single data frame. This sampling technique ensured that, in the training data, all agencies were represented and that no agency had a sample of fewer than 400 calls. In the training subset, the largest agency sample, NYPD, consisted of 60k calls. To ensure that every unique descriptor was also represented, rows with descriptors that didn’t appear in the original training data samples were subsequently identified and added.

A Keras neural network Sequential model, which is appropriate when attempting to build a model with a single input and output, was created to classify the agency. Both the input variable (descriptor) and output variable (agency) required some further preprocessing before being fed into the model. The below code shows how the descriptor column from the data frame of weighted samples was processed.

In natural language processing, a tokenizer separates a text (the descriptors) into smaller “tokens” (words within the descriptors). The Keras fit_on_texts() method updates the tokenizer’s vocabulary based on the list of texts that is passed in. The texts_to_matrix() method converts the list of texts to a NumPy matrix, and specifying the mode “binary” means that only the presence or absence of each word will be accounted for in the matrix. The word index is a dictionary that contains words in the tokenizer’s vocabulary as keys, and integers that correspond to the keys’ frequency rank as values. The word “music,” for example, had a value of 1 because it is the first most frequent word in the descriptor column.

The agency values were processed sklearn’s label encoder, as shown below.

The label encoder assigns a unique integer to each of the 14 agencies in the column. Note that both the descriptor and agency data were one-hot encoded. One-hot encoding converts a single categorical variable to multiple binary variables, where each new variable is a value of the original one. One-hot encoding can be used to process categorical data when there is no relationship between different categories. Without this step, the model may identify a relationship that doesn’t actually exist and mis-classify the input data accordingly.

Given that the models were trained on data in which minority agencies were oversampled, evaluating their performance only on test data with similar oversampling would produce misleading results. For this reason, performance was also evaluated on a random subset of the original data that underwent the same one-hot encoding process shown above. The below function was used to fit and evaluate a model given a dictionary of parameters. It returns plots of the accuracy and loss curves, as well as accuracy scores for both the test and random subset (‘r_test’ and ‘r_label_test’) data.

The overall best-performing model had the following parameters passed in:

The model had 90% accuracy on the test data and 73% accuracy on the random subset that didn’t undergo any resampling. While these are far from perfect accuracy scores, they are substantially higher than what we would expect to get from classifying all calls’ responding agency as the NYPD. The plots below visualize the learning history of the model:

The accuracy curves, despite remaining constant for about the first 50 epochs, moved upward thereafter. The loss curves, similarly, move downward as the number of epochs increases. Given that neither set of curves seems to get worse as the number of epochs increases, we can be confident that the model is not overfit to the training data.

Conclusion

The description of a 311 call, after being encoded as numeric data, is a strong predictor of the agency that responded to the call. Noise complaints make up the majority of 311 calls and are assigned to the NYPD. This analysis lends itself to further inquiry into how non-emergency service requests are handled in NYC and other cities. If a similar classifier were trained on larger and more diverse description data, could the assignment of government agencies to non-emergency requests be handled more efficiently? In terms of meeting the needs of city residents, does it make sense for the same agency responsible for emergency calls to handle most 311 requests? Future research in the fields of data science and public policy should explore these questions further.

--

--