A Census-based Deprivation Index using R

Ivan Castro
Towards Data Science
4 min readAug 29, 2018

--

Area deprivation scores at the county-level. Brighter colors represent higher deprivation relative to all counties in the U.S.

Where you live has significant effects on your health; more specifically, those living in deprived areas have worse health outcomes. Thus, deprivation indices are frequently utilized in health research. To estimate neighborhood-level deprivation most researchers rely on census data, but, collecting and cleaning such data can take considerable time and effort. Here I present an R function that collects the data and creates a standardized deprivation index with little effort.

To run this function you will need the tidycensus and psych packages. All the code is posted in GitHub.

How it works

The index is based on methodology by Messer and colleagues. In short, they identified that the principal component extracted from eight specific variables best represent neighborhood-level deprivation. Following their methodology, the following variables are needed at the census tract-level:

% with less than HS degree (25 years and over)

% in below poverty level

% of female-headed households with children under 18

% in management, science, and arts occupation

% in crowded households (greater than 1 occupant per room)

% with public assistance or food stamps

% unemployed (16–64 years old in labor force)

% with less than 30K annual household income

The function collects census estimates, transforms the variables, and then performs a Principal Component Analysis (PCA). Estimates are collected at the tract level for a given county. Since the index has been validated in previous research, the PCA only extracts one component.

Using the function: countyND()

The function works by entering the arguments State and County. The output variable ‘PC1’ is the deprivation index score for each corresponding census tract (CT) in the analysis. Higher index scores represent higher deprivation. These scores can be explored on their own or exported for use in statistical models.

Examples

Here is the distribution of deprivation across census tracts in Onondaga County, NY.

NDI <-countyND("NY","Onondaga")
ggplot(NDI, aes(PC1)) + geom_histogram() + theme_classic()

I further explore the distribution of deprivation scores by assigning categories. By classifying census-tracts based on its location within or outside of Syracuse, it is clear that most City tracts have higher neighborhood deprivation than County tracts.

NDI$type[as.numeric(NDI$GEOID) < 36067006104] <- "City Tract"
NDI$type[as.numeric(NDI$GEOID) >= 36067006104] <- "County Tract"

ggplot(NDI, aes(reorder(Tract, -PC1), PC1)) + geom_col(aes(fill = type)) + coord_flip() +
theme(axis.text.x = element_text(size = 8, color = "black"),
axis.text.y = element_text(size = 4, color = "black")) +
scale_fill_viridis_d(option = "cividis") +
labs(fill = "", x = "Tract", y = "Deprivation Index")

Thematic Mapping

The index can be further explored by its spatial distribution. Mapping deprivation scores show that high levels of deprivation concentrate within the City of Syracuse.

However, mapping deprivation scores for City of Syracuse only, some variation is still noticeable.

Additionally, by omitting the County argument, the function will return a deprivation index for all counties in the given State.

NYND <- countyND("NY")ggplot(NYND, aes(PC1, color = County)) + geom_density() + 
theme_classic() + guides(colour=FALSE) +
scale_color_viridis_d() +
labs(x = "Deprivation Index for all Counties in NYS")

Spatial distribution of deprivation in New York:

Neighborhood-level deprivation across New York

All the code and more examples can be found on GitHub. I wrote this function for my own analyses; but, maybe someone else will find it useful as well.

--

--

Data Analysis & Visualization in R | Population Health, Spatial Epidemiology | I have more questions than answers | iecastro.netlify.com | github.com/iecastro