According to recent census data, 8.8 percent of people in the United States do not have health insurance. That’s 28 million people who may receive substandard medical care or be turned away due to their lack of status. Because the United States does not have nationalized healthcare, citizens rely on health insurance to cover medical expenses. There are a wide variety of coverage types offered by both private companies and public healthcare systems like Medicare and Medicaid. Rates of uninsured were at 18% before the Affordable Care Act (ACA) mandate in 2013.

Why is this a problem?
For the patient, being uninsured means that you may be turned away for your inability to pay or you may receive substandard medical care (fewer tests, less attention from the physician, etc). If the reason is severe, you may have to accumulate bad debt (knowingly using credit you will not be able to repay) to ensure you can access care. For the physician, you must balance the ethical implications of turning away patients for a lack of finances and the steps you may have to take to lessen their burden, for example, underwriting fees, exaggerating symptoms, or sending them to a publicly funded alternative.
Collecting and Preparing Data
I concatenated four datasets for this project to flush out some of my hypothesized features. I started with a community health indicators set from HealthData.gov. Then I incorporated a few datasets from the Census Bureau to get median household income (1), demographics (2), and rural vs urban breakdowns (3).
I imputed missing values for each county’s community health indicators based on the average of the corresponding state. I had to drop some rows due to some of the four sets treating United States’ territories as county-equivalent and some remote counties not having data collected.
Finally, statistical tests were conducted to elucidate some differences between rural and urban counties. The rate of uninsured, median household income, presence of community health centers, and if that county was medically underserved were all significantly different between rural and urban counties.

Starting to Model
This is after we have cleaned and processed our data. My full notebook can be found here.
So our R² metric is about 0.4. This seems fairly low, right? Well for a social sciences project like this, it isn’t too bad! Frankly, modeling human decision making is really difficult! While we cannot say that we have all the answers, this model will still allow us to determine some important factors in the rate of uninsured.
Examining Coefficients
So let’s look at our coefficients. Because we scaled our continuous variables we can interpret these coefficients against each other. The syntax is "For each unit increase of one standard deviation from the mean in ‘Elderly_Medicare’ we would expect an increase of 6.5% in the rate of uninsured".
We can see that elderly on medicare, the ratio of physicians to 100k population, and smoker rate were the largest positive factors (that is, they increased the uninsured rate).
For negative factors, urban population percentage, overall rural population, and median household income were meaningful. What this tells us is as counties become more populated and/or more urban their uninsured rate declines. Furthermore, higher median household income is linked with lower uninsured rates and urban counties have a significantly higher income than rural counties (56k vs 44k, pvalue=1.07e-80).

Conclusions
After running our Linear Regression, we were able to determine that a mixture of community health indicators and demographic factors had a significant effect on the rate of uninsured. A high percentage of smokers and of elderly citizens on Medicare contributed to increased uninsured rates. I posit that the elderly population on Medicare is acting as a proxy measure of the total elderly population in the county, and because the elderly and children are more likely to be uninsured, this is driving up the rate.
Other factors decreased the rate. Mainly, every one standard deviation increase in urban population percentage decreased the uninsured rate by 5.62%. Median household income had a smaller, but significant, coefficient for a decrease of 2.44% for each increase in standard deviation. Finally, as county population increases, whether urban or rural, the uninsured rate decreases.
Future Directions
I would like to incorporate more granular data, for example, while household income played a role in the rate of uninsured, our data was limited to the county-level, counties with a mix of urban and rural populations may be influenced by the urban population’s increased average income.
There are also factors that could be affecting this rate that have been difficult to gather data on. I would propose a system of surveys and aggregating health records that could provide reasons as to why patients did not have insurance. For example, a list of health facilities with charity systems in place (where uninsured patients can access a general "fund" to pay for procedures) to examine how this affects the uninsured rate.
Sources
[1] S. Weiner, "I Can’t Afford That!: Dilemmas in the Care of the Uninsured and Underinsured" (2001), Journal of General Internal Medicine 16: 412–418.
[2] J. Cohen, "Statistical Power Analysis" (1992), Current Directions in Psychological Science 3: 98–101.
Datasets can be found in the article itself. My repo is here.