Linear Correlations of 2016 Presidential Election Data to Various Arbitrarily Chosen Data

Out of curiosity (and boredom), I have collected data on presidential election votes and associated them with data on various types of mortality rates, sexually transmitted diseases, immunization rates, criminal activity, and education scores.

Franjo Ivankovic, Ph.D.
Towards Data Science

--

Please bear in mind that these graphs and summaries represent a rather simplistic statistical analysis, and linear regressions with a single independent variable usually do not adequately explain the dependent variable. But, since it’s 3AM and I have consumed too much coffee to be able to fall asleep — let’s have some fun regardless! The data is analyzed on the state level (N_max = 50).

Unless otherwise stated, all examples are fitted to simple linear models with a single independent variable (see directly below). The instances with missing data have been excluded from analysis.

Figure 1.: A mathematical representation of a simple linear model with a single independent variable.

Mortality rate due to malignant neoplasms (per 100,000) vs. people voting for Trump (%):

Figure 2.: A scatterplot relating age adjusted mortality rate due to malignant neoplasms (i.e. cancer, in incidents per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.21
p_F = 0.0004824 ***
p_t[𝝱0] = 0.856997
p_t[𝝱1] = 0.000482 ***

Interestingly, relating data on cancer-caused mortality to Trump votes yields a relatively weak (see R²_adj) but nonetheless seemingly significant (see p_F) relationship. Outliers in this particular model appear to be NC, UT, and WI. This relationship does not hold in case of Clinton voters.

Mortality rate due to heart disease (per 100,000) vs. people voting for Trump (%)

Figure 3.: A scatterplot relating age adjusted mortality rate due to heart disease (in incidents per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.2553
p_F = 0.0001084 ***
p_t[𝝱0] = 0.020797 *
p_t[𝝱1] = 0.000108 ***

This model shows signs of correlation. Although the correlation is weak, it’s rather significant as well. Outliers in this model also appear to be NC, UT, and WI.

Mortality rate due to motor-vehicle accidents (per 100,000) vs. people voting for Trump (%)

Figure 4.: A scatterplot relating age adjusted mortality rate due to motor-vehicle accidents (in incidents per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.4948
p_F = 7.376 * 10^(-9) ***
p_t[𝝱0] = 2.79 * 10^(-12) ***
p_t[𝝱1] = 7.38 * 10^(-9) ***

Well this is an unexpected outcome. States that have an issue with motor-vehicle accidents appear to have favored Trump in 2016. The correlation is actually in moderate range this time around. In fact, a similarly strong relationship (but in opposite direction) can be observed when substituting Trump votes with Clinton votes (see below).

Figure 5.: A scatterplot relating age adjusted mortality rate due to motor-vehicle accidents (in incidents per 100,000) to percentage of people voting for Clinton in each state.
R²_adj = 0.3971
p_F = 5.648 * 10^(-7) ***
p_t[𝝱0] < 2 * 10^(-16) ***
p_t[𝝱1] = 5.65 * 10^(-7) ***

Examining the model with Trump voters a bit further, Breusch-Pagan (p = 0.1193) and Score test (p = 0.1449) suggest absence of heteroskedasticity in data. Shapiro-Wilk (p = 0.6651), Kolmogorov-Smirnov (p = 0.8544), and Anderson-Darling (p = 0.4715) tests for normality all suggest that the data come from a normal distribution. Durbin-Watson test (p = 0.89) suggests uncorrelated errors.

Therefore, we might be able to predict with some confidence how a particular state has voted in 2016 presidential elections using only mortality rates due to motor-vehicle accidents. Alternatively, we can use states’ voting records to predict their mortality rates due to motor-vehicle accidents.

Mortality rate due to suicide (per 100,000) vs. people voting for Trump (%)

Figure 6.: A scatterplot relating age adjusted mortality rate due to suicide (in incidents per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.24
p_F = 0.0001811 ***
p_t[𝝱0] = 2.26 * 10^(-8) ***
p_t[𝝱1] = 0.000181 ***

This model also shows signs of significant correlation, albeit weak between states that favored Trump and mortality rates due to suicide. Conversely, states that favored Clinton show lesser rates of such mortality — with an even better fit.

Figure 7.: A scatterplot relating age adjusted mortality rate due to suicide (in incidents per 100,000) to percentage of people voting for Clinton in each state.
R²_adj = 0.4636
p_F = 3.196 * 10^(-8) ***
p_t[𝝱0] < 2 * 10^(-16) ***
p_t[𝝱1] = 3.2 * 10^(-8) ***

Running diagnostics on Trump votes model yields the following:

Breusch-Pagan p = 0.1763463                 | No heteroskedasticity!
Score test p = 0.08569949 | No heteroskedasticity!
Shapiro-Wilk p = 0.2942 | Normal distribution!
Kolmogorov-Smirnov p = 0.8451 | Normal distribution!
Anderson-Darling p = 0.3797 | Normal distribution!
Durbin-Watson p = 0.704 | Uncorrelated errors!

Mortality rate due to homicide (per 100,000) vs. people voting for Trump (%)

Figure 8.: A scatterplot relating age adjusted mortality rate due to homicide (in incidents per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.1366
p_F = 0.007208 **
p_t[𝝱0] = 2.61 * 10^(-16) ***
p_t[𝝱1] = 0.00721 **

This model shows a weak correlation between the homicide morality rates and Trump support. However, the parameter estimates seem to yield a very nice t-test statistic. Potentially influential outliers are HI, LA, and TX.

Mortality rate due to drug poisoning (per 100,000) vs. people voting for Trump (%)

Figure 9.: A scatterplot relating age adjusted mortality rate due to drug poisoning (in incidents per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.0179
p_F = 0.7115
p_t[𝝱0] < 2 * 10^(-16) ***
p_t[𝝱1] = 0.711

There is neither significant nor noticeable correlation between states that supported Trump in 2016 elections and mortality rate due to drug poisoning.

Conclusion on Mortality Rates

It appears that the states supporting Trump in 2016 tend to suffer from high mortality rates due to various causes, as compared to states supporting Clinton. The most prominent examples are that of motor-vehicle and suicide related mortalities. One notable exception to this apparent rule is rate of mortality due to drug poisoning — the only example with no significant or noticeable correlation between the two variables.

Rate of infant deaths (per 1,000 live births) vs. people voting for Trump (%)

Figure 10.: A scatterplot relating rate of infant deaths (per 1,000 live births) to percentage of people voting for Trump in each state.
R²_adj = 0.256
p_F = 0.0001239 ***
p_t[𝝱0] = 0.000297 ***
p_t[𝝱1] = 0.000124 ***

When it comes to Trump support and infant death rate, it appears that the states favoring Trump also have slightly higher infant death rates. Albeit significant, this relationship is relatively weak with only a slight correlation between the two variables.

Rate of non-hispanic black infant deaths (per 1,000 live births) vs. people voting for Trump (%)

Figure 11.: A scatterplot relating rate of non-hispanic black infant deaths (per 1,000 live births) to percentage of people voting for Trump in each state.
R²_adj = 0.3689
p_F = 6.537 * 10^(-5) ***
p_t[𝝱0] = 0.0225 *
p_t[𝝱1] = 6.54 * 10^(-5) ***

Correlation is higher, and the model fits better overall when we only look at non-hispanic black infant death instances. Running diagnostics on this model yields pretty tidy results, as expected.

Breusch-Pagan p = 0.4146038                 | No heteroskedasticity!
Score test p = 0.2699911 | No heteroskedasticity!
Shapiro-Wilk p = 0.3525 | Normal distribution!
Kolmogorov-Smirnov p = 0.8966 | Normal distribution!
Anderson-Darling p = 0.5787 | Normal distribution!
Durbin-Watson p = 0.44 | Uncorrelated errors!

As expected, the trend is exactly the opposite for the states favoring Clinton:

Figure 12.: A scatterplot relating rate of non-hispanic black infant deaths (per 1,000 live births) to percentage of people voting for Clinton in each state.
R²_adj = 0.3361
p_F = 0.0001565 ***
p_t[𝝱0] = 2.85 * 10^(-12) ***
p_t[𝝱1] = 0.000156 ***
Breusch-Pagan p = 0.5336099 | No heteroskedasticity!
Score test p = 0.3860135 | No heteroskedasticity!
Shapiro-Wilk p = 0.2888 | Normal distribution!
Kolmogorov-Smirnov p = 0.4213 | Normal distribution!
Anderson-Darling p = 0.2453 | Normal distribution!
Durbin-Watson p = 0.514 | Uncorrelated errors!

Rate of hispanic infant deaths (per 1,000 live births) vs. people voting for Trump (%)

Figure 13.: A scatterplot relating rate of hispanic infant deaths (per 1,000 live births) to percentage of people voting for Trump in each state.
R²_adj = 0.09937
p_F = 0.03432 *
p_t[𝝱0] = 1.74 * 10^(-5) ***
p_t[𝝱1] = 0.0343 *

There is a negligent relationship between the states favoring Trump in 2016 election and hispanic infant death rate.

Conclusion on Infant Death Rates

In accordance with the mortality data, infant death rates also trend towards higher Trump support. This is particularly true in case of the non-hispanic black infant death rate where this relationship is very obvious and model fits quite nicely.

Rates of Chlamydia (per 100,000) vs. people voting for Trump (%)

Figure 14.: A scatterplot relating rates of Chlamydia (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.01594
p_F = 0.6329
p_t[𝝱0] = 1.7 * 10^(-8) ***
p_t[𝝱1] = 0.633

There appears to be no obvious or significant relationship between Trump support and rates of Chlamydia.

Rates of Gonorrhea (per 100,000) vs. people voting for Trump (%)

Figure 15.: A scatterplot relating rates of Gonorrhea (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.02107
p_F = 0.1582
p_t[𝝱0] = 1.13 * 10^(-15) ***
p_t[𝝱1] = 0.158

There also appears to be no obvious or significant relationship between Trump support and rates of Gonorrhea.

Rates of Syphilis (per 100,000) vs. people voting for Trump (%)

Figure 16.: A scatterplot relating rates of Syphilis (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.04053
p_F = 0.08614
p_t[𝝱0] < 2 *10^(-16) ***
p_t[𝝱1] = 0.0861

There also appears to be no obvious or significant relationship between Trump support and rates of Syphilis.

Conclusion on Sexually Transmitted Diseases

There appears to be no obvious link between sexually transmitted diseases and level of support for Trump.

Estimated MMR vaccination coverage among children aged 19–35 months (%) vs. people voting for Trump (%)

Figure 17.: A scatterplot relating estimated MMR vaccination coverage among children aged 19–35 months (%) to percentage of people voting for Trump in each state.
R²_adj = 0.07459
p_F = 0.03083 *
p_t[𝝱0] = 0.00221 **
p_t[𝝱1] = 0.03083 *

There is a very small correlation on a model of less-than-desirable fit indicating a potentially negative correlation between the Trump support and MMR vaccination coverage.

Estimated DTaP vaccination coverage among children aged 19–35 months (%) vs. people voting for Trump (%)

Figure 18.: A scatterplot relating estimated DTaP vaccination coverage among children aged 19–35 months (%) to percentage of people voting for Trump in each state.
R²_adj = 0.2246
p_F = 0.0003003 ***
p_t[𝝱0] = 1.33 * 10^(-6) ***
p_t[𝝱1] = 3 * 10^(-4) ***

There is a moderate evidence of states supporting Trump in 2016 presidential election and reduced coverage of DTaP immunization. Model diagnostics don’t seem to be raising any red flags.

Breusch-Pagan p = 0.6658235                 | No heteroskedasticity!
Score test p = 0.6299109 | No heteroskedasticity!
Shapiro-Wilk p = 0.7211 | Normal distribution!
Kolmogorov-Smirnov p = 0.7435 | Normal distribution!
Anderson-Darling p = 0.6068 | Normal distribution!
Durbin-Watson p = 0.296 | Uncorrelated errors!

Estimated HepB vaccination coverage among children aged 19–35 months (%) vs. people voting for Trump (%)

Figure 19.: A scatterplot relating estimated HepB vaccination coverage among children aged 19–35 months (%) to percentage of people voting for Trump in each state.
R²_adj = 0.05479
p_F = 0.05585
p_t[𝝱0] = 0.1628
p_t[𝝱1] = 0.0558

There is neither an obvious nor significant evidence of relationship between HepB vaccination coverage and support for Trump in 2016 presidential elections.

Estimated HepA vaccination coverage among children aged 19–35 months (%) vs. people voting for Trump (%)

Figure 20.: A scatterplot relating estimated HepA vaccination coverage among children aged 19–35 months (%) to percentage of people voting for Trump in each state.
R²_adj = 0.01044
p_F = 0.4857
p_t[𝝱0] = 4.1 * 10^(-5) ***
p_t[𝝱1] = 0.486

Just like it was the case for HepB, HepA data also lacks relationship to 2016 presidential elections data.

Estimated Rotavirus vaccination coverage among children aged 19–35 months (%) vs. people voting for Trump (%)

Figure 21.: A scatterplot relating estimated Rotavirus vaccination coverage among children aged 19–35 months (%) to percentage of people voting for Trump in each state.
R²_adj = 0.08274
p_F = 0.02417 *
p_t[𝝱0] = 2.22* 10^(-5) ***
p_t[𝝱1] = 0.0242 *

Albeit an acceptable fit, the correlation between the two variables is way too small to make any deductions.

Conclusion on Vaccinations

All but DTaP vaccination coverage have been shown to not vary with Trump support data. On the other hand, DTaP vaccination coverage appears to be inversely proportional Trump support rates.

Rate of violent crime (per 100,000) vs. people voting for Trump (%)

Figure 22.: A scatterplot relating rate of violent crime(per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.009291
p_F = 0.4624
p_t[𝝱0] = 5.1 * 10^(-13) ***
p_t[𝝱1] = 0.462

No apparent association between these two variables.

Rate of murder and non-negligent manslaughter (per 100,000) vs. people voting for Trump (%)

Figure 23.: A scatterplot relating rate of murder and non-negligent manslaughter (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.04231
p_F = 0.08158
p_t[𝝱0] < 2 * 10^(-16) ***
p_t[𝝱1] = 0.0816

No apparent association between these two variables.

Rate of rape (per 100,000) vs people voting for Trump (%)

Figure 24.: A scatterplot relating rate of rape (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.05454
p_F = 0.05627
p_t[𝝱0] = 8.62* 10^(-13) ***
p_t[𝝱1] = 0.0563

No apparent association between these two variables.

Rate of robbery (per 100,000) vs people voting for Trump (%)

Figure 25.: A scatterplot relating rate of robbery (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.07853
p_F = 0.02741 *
p_t[𝝱0] < 2 * 10^(-16) ***
p_t[𝝱1] = 0.0274 *

No apparent association between these two variables despite the model’s okay fit.

Rate of aggravated assault (per 100,000) vs people voting for Trump (%)

Figure 26.: A scatterplot relating rate of property crime (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.05783
p_F = 0.05096
p_t[𝝱0] < 2 * 10^(-16) ***
p_t[𝝱1] = 0.051

No apparent association between these two variables.

Rate of property crime (per 100,000) vs people voting for Trump (%)

Figure 27.: A scatterplot relating rate of property crime (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.008801
p_F = 0.2368
p_t[𝝱0] = 2.45 * 10^(-9) ***
p_t[𝝱1] = 0.237

No apparent association between these two variables.

Rate of burglary (per 100,000) vs people voting for Trump (%)

Figure 28.: A scatterplot relating rate of burglary (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.08126
p_F = 0.02526 *
p_t[𝝱0] = 1.1 * 10^(-13) ***
p_t[𝝱1] = 0.0253 *

Despite the okay model fit, the correlation between the two variables is too low to make conclusions about the relationship between them.

Rate of larceny-theft (per 100,000) vs people voting for Trump (%)

Figure 29.: A scatterplot relating rate of larceny-theft (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.0008896
p_F = 0.3121
p_t[𝝱0] = 3.96 * 10^(-8) ***
p_t[𝝱1] = 0.312

No apparent association between these two variables.

Rate of motor-vehicle theft (per 100,000) vs people voting for Trump (%)

Figure 30.: A scatterplot relating rate of motor-vehicle theft (per 100,000) to percentage of people voting for Trump in each state.
R²_adj = 0.01918
p_F = 0.7815
p_t[𝝱0] < 2 * 10^(-16) ***
p_t[𝝱1] = 0.781

No apparent association between these two variables.

Conclusions on Violent and Property Crime

There appears to be no relationship between violent and property crime, and support for Trump in 2016 presidential election.

WalletHub Educational Attainment and Quality of Education Score (1–100) vs. people voting for Trump (%)

Figure 31.: A scatterplot relating WalletHub Educational Attainment and Quality of Education Score (1–100) to percentage of people voting for Trump in each state.
R²_adj = 0.4414
p_F = 8.66 * 10^(-8) ***
p_t[𝝱0] < 2 * 10^(-16) ***
p_t[𝝱1] = 8.66 * 10^(-8) ***

There is a relatively evidence of states supporting Trump in 2016 presidential election and reduced educational attainment and quality of education. Model diagnostics don’t seem to be raising any red flags.

Breusch-Pagan p = 0.7193591                 | No heteroskedasticity!
Score test p = 0.7333875 | No heteroskedasticity!
Shapiro-Wilk p = 0.6670 | Normal distribution!
Kolmogorov-Smirnov p = 0.9474 | Normal distribution!
Anderson-Darling p = 0.5443 | Normal distribution!
Durbin-Watson p = 0.812 | Uncorrelated errors!

Based on these results, it appears that enthusiasm for trump in inversely proportional to educational attainment and quality of education.

Final Summary

  1. Trump enthusiasm is negatively correlated with educational attainment and quality of education.
  2. Trump enthusiasm is negatively correlated with DTaP vaccination coverage.
  3. Trump enthusiasm is positively correlated with non-hispanic black infant and overall infant death rates.
  4. Trump enthusiasm is positively correlated with mortality rates due to malignant neoplasm, heart disease, motor-vehicle accidents, suicide, and homicide.
  5. Clinton enthusiasm is negatively correlated with mortality rates due to suicide.
  6. No apparent link exists between Trump enthusiasm and STDs, or Trump enthusiasm and violent and property crime.

Sources:

  1. Federal Election Commission (2017). FEDERAL ELECTIONS 2016: Election Results for the U.S. President, the U.S. Senate and the U.S. House of Representatives.
  2. Xu, J.Q., Murphy, S.L., Kochanek, K.D., Bastian, B., Arias, E. (2018). Deaths: Final data for 2016. National Vital Statistics Reports, 67(5). National Center for Health Statistics.
  3. Centers for Disease Control and Prevention (2017). Sexually Transmitted Disease Surveillance 2016. Atlanta: U.S. Department of Health and Human Services.
  4. Rossen, L.M., Bastian, B., Warner, M., Khan, D., Chong, Y. (2017). Drug poisoning mortality: United States, 1999–2016. National Center for Health Statistics.
  5. Hill, H. A., Elam-Evans, L. D., Yankey, D., Singleton, J. A., & Dietz, V. (2016). Vaccination Coverage Among Children Aged 19–35 Months — United States, 2015. MMWR. Morbidity and Mortality Weekly Report, 65(39), 1065–1071.
  6. United States Department of Justice, Federal Bureau of Investigation. (2018). Crime in the United States, 2017.
  7. Bernardo, R. (2018). 2018’s Most & Least Educated States in America. Wallet Hub.
.CSV and .description files available here!

--

--

Research fellow in psychiatric and genetic epidemiology at MassGeneral and Broad Institute. Let's talk about OCD, Tourette syndrome, ADHD, and eating disorders.