The world’s leading publication for data science, AI, and ML professionals.

Propensity score estimation and visualization

Using simulated dataset

Photo by Filiberto Santillán on Unsplash
Photo by Filiberto Santillán on Unsplash

Propensity score (PS)-based methods are often used for confounding adjustment in my field – epidemiology – and across a variety of other scientific fields. In this post, I simulate two small datasets (N=20,000) with two different treatment variables (x1 and x2) with the prevalence of 7% and 20%, respectively. I then go on and simulate two different outcomes (y1 and y2) with the prevalences in the simulated population of 6% and 16%, respectively. The true effect of the exposures on the outcomes is null. This post is largely inspired by the paper by Desai et al. (2017).

What this post will cover?

  • Simulating my own dataset with null associations between two different exposures (x1 and x2) and outcomes y1 and y2 for each of exposures (4 exposure-outcome pairs)
  • Computing propensity scores (PS) for each exposure, trim non-overlapping areas of PS distribution between exposed and unexposed
  • Running several Logistic Regression models

In this post, I only share selected code snippets. The version of this post with all code is available here.

Data simulation

I simulate 10 confounders, 2 predictors of treatment only, and 2 predictors of the outcome only.

Analyses

The crude analyses of the associations between exposures and the outcomes suggested a weak positive association with a wide 95% confidence interval around the estimates. The maximum bias was approximately 10% (for the less common outcome of the two, y1, with a prevalence of about 6% in the simulated population.)

Propensity score

I computed the propensity score (PS) separately for each exposure using the confounders and the predictors of the outcomes.

The propensity score (PS) distribution was largely overlapping between the treated and the untreated, and so the trimming of the non-overlap areas did not result in the loss of many data points.

Using propensity score, I computed the standardized mortality ratio (SMR) weights and applied them for re-weighting of the unexposed as if they were exposed; I then compared the crude associations between the exposures and the outcomes, conventionally adjusted logistic regression models, and models adjusted via SMR weighting.

The comparison of three different logistic regression models revealed the following:

  • The crude analyses were imprecise and suggested associations were present between the exposures (x1, x2) and the outcomes (y1, y2), while there were null true associations (maximum bias in crude analyses was approximately 14.5%).
  • The conventional adjustment in the logistic regression model produced imprecise estimates and suggested weak associations, including reverse associations for some exposure-outcome pairs (maximum bias of approximately 11%).
  • SMR weighting produced imprecise results and similarly to conventional adjustment suggested some weak inverse associations between exposures and outcomes (maximum bias of approximately 7.5% was observed for the x2y1 exposure-outcome pair).
  • Although uncertainty presented an issue in all three analyses, the SMR weighting approach arguably produced the point estimates closest to the true (simulated) null effects.

References

  1. Desai RJ, Rothman KJ, Bateman BT, Hernandez-Diaz S, Huybrechts KF. A Propensity-score-based Fine Stratification Approach for Confounding Adjustment When Exposure Is Infrequent. Epidemiology. 2017;28(2):249–257. doi:10.1097/EDE.0000000000000595

Related Articles