FAIRNESS AND BIAS

How to Fix Feature Bias

Choosing a strategy requires testing and tradeoffs

Valerie Carey

Published in

Towards Data Science

12 min readFeb 8, 2021

Feature bias, which reflects measurement errors or biases in human judgment, can negatively impact fairness of machine learning models. This post discusses five potential strategies for mitigation of such bias. The best method is context-dependent. Explainability techniques are essential for ensuring a solution addresses bias and does not introduce additional unfairness. This blog is a follow-up to my previous post [1], which discussed shortcomings of one mitigation method.

Feature bias occurs when a feature has a different meaning across groups (e.g. racial or gender categories). This may reflect measurement errors, differences in self-reporting, or biased human judgment. Here are some example scenarios that illustrate the issue:

Men are less likely than women to report a family history of cancer, even if it exists [2]; this difference could lead to under-estimation of risk for males.
The use of prior arrests in models of criminal recidivism has been criticized because arrests do not reflect underlying crimes in a uniform manner. For example, when there are racial differences in detection of crime or the use of arrests vs. warnings, a model can over-estimate risks for some groups [3, 4, 5].
Lower income people may have difficulty accurately reporting employment and earnings, partly because it is difficult to concisely describe “income” when it varies over time or involves multiple sources [6]. Such a survey question may be more ambiguous, and therefore more prone to error, for this group.

It’s crucial to understand your data sources. For example, an income feature may be a low risk for feature bias if it is taken directly from payroll or tax return data, but a very high risk if self-reported. Differences in rates of missing or default values across groups are also red flags. Explainability techniques can help identify potentially risky features [1, 7].

If a feature is likely to be biased, what can be done about it? This post contains my thoughts on five possible mitigation methods. There is no perfect technique that is appropriate for every situation. Mitigation of feature bias is likely to involve trying several strategies and thoroughly testing results.

Modify Your Data

An obvious strategy for feature bias mitigation is to fix the data. In many scenarios this might be difficult or impossible, but it’s worth considering. Data modification reduces risks of unintended consequences compared to other strategies. It’s also the only technique that works if you don’t know group membership for all cases. In data I’ve worked with, for example, it’s typical to have sex or race/ethnicity information for only a subset of people; fairness can be assessed using these cases, but we want predictions on observations lacking that information.

There can be many ways to adjust the data. It may be possible to drop the biased feature or to substitute information from a more reliable source. If you have control over the feature, you may be able to implement a direct fix. For example, if bias results from wording in a survey you administer, changing the survey and re-collecting data may be the best option. If feature bias affects the extremes of a feature (e.g. the highest or lowest income individuals), thresholding or bucketing could be useful.

If feature bias is strongly linked to group membership, it may also be possible to rescale values, e.g. use quartiles within groups. A rescaling method was tested in a recent paper [5]. and found to perform poorly.

Include the Sensitive Feature in the Model (Recap)

I discussed this strategy in detail in the previous blog post [1]. In brief, some argue that including a sensitive feature in a nonlinear model will lead to the model automatically adjusting relative strengths of feature contributions across groups [3]. However, my post argues that such interaction effects are not guaranteed to be incorporated into a model, and unintended consequences can occur when sensitive features make “main-effect-like” contributions to predictions.

Incorporation of a sensitive feature is particularly dangerous when group membership is correlated with other predictors in the model, or with unmeasured characteristics that have a causal relationship with the outcome. This makes it especially likely to be incorporated as a main effect, or as an interaction with unexpected features.

It’s possible to use various explainability techniques to determine sensitive feature inclusion has the intended effects [1]. For some data sets, the advantages of this method could outweigh its risks.

Use a Different Model Type

For the sensitive feature inclusion technique, it’s possible that different model types could better capture pairwise interactions. For my prior post [1], I tried two methods: random forest and XGBoost, but reported random forest results because they are easier to interpret. Here, I’ll discuss XGBoost. Although the two model types both show risks, XGBoost generally mitigates feature bias better than random forest for my test scenario.

In brief, my methodology involved random assignment of a male or female “gender” to cases in a public loans data set, and then reduction of the value of an income feature for females only. Although “gender” had no impact on actual default rates, feature bias in the income predictor led to an over-prediction of loan defaults for females. My post then tested the assumption that adding the female status to the feature would correct bias. For details, see the previous post [1]; code is on GitHub [8].

I examined whether female feature inclusion made the population-level model results more like actual default rates. For random forest, the correction was weak [1]. For XGBoost, I see greater agreement:

Actual default rate compared to predictions from random forest (RF) and XGBoost (XGB) models that include the female indicator feature.

Population-level differences are also apparent in aggregated Shapley value plots. Such plots can identify features driving differences across groups [7, 9]; here I am showing how much of the difference in default rate for females vs. males is due to each feature. I use these plots to compare models with and without a sensitive feature:

Aggregated Shapley values for females compared to a male reference for random forest (left) and XGBoost (right) models without and with the female indicator feature. Image by author.

Focusing first on the gray bars, it’s clear that feature bias affects model predictions via the income feature, as expected. Both model types are affected similarly.

Differences are apparent when the female indicator is introduced into the model (orange bars in the plots). The compensating effect of this feature is much greater for XGBoost than for the random forest model. This is consistent with better population-level bias correction for XGBoost.

For XGBoost only, the income bar changes markedly for the model including female status, indicating a greater impact of this feature. This is also apparent in global importances; without the female feature, income is the third most important feature by permutation importance, but it becomes the most important when female status is included. This increase may be expected, given that feature bias reduces the correlation between income and default status. The model with the biased feature is probably under-estimating the impact of income.

I am not an expert in these algorithms, but the way I think of the difference is that XGBoost provides a directed search, whereas the random forest model covers a greater diversity of solutions. If we have a strong predictor that is correlated with another feature, the random forest will sample some solutions involving the stronger predictor, and some using the weaker predictor. However, XGBoost will not incorporate the weaker predictor if a solution has been found involving the stronger predictor, assuming the weaker predictor has no independent effects. Therefore, XGBoost models tend to rely on a smaller group of stronger features. The income feature in the XGBoost model is less diluted with correlated information, and we see stronger effects.

All this suggests that XGBoost is better adjusting for feature bias when the sensitive feature is included in the model, compared to random forest. However, population-level responses don’t guarantee fairness in individual cases. We may be adjusting for bias in a non-specific manner, for example by uniformly reducing risks for all females, rather than just those biased incomes place them at risk.

In the example scenario, female status should act only through interactions with income. Any “main effect like” behavior is essentially offsetting feature bias with stereotyping, not correcting bias. We can assess the extent of “main effect like” and second-order effects using Accumulated Local Effect (ALE) plots [10]. For the random forest model, ALE plots showed comparable main effects and second-order effects, indicating that some degree of s non-specific adjustment was occurring [1].

For XGBoost, ALE plots show stronger overall effects for income and female status than was seen for random forest [1]. However, the one-way and 2-way magnitudes are again comparable, indicating that the “main effect like” and interaction contributions are similar:

Left: One-way ALE plot for the female feature in the XGBoost model. Right: Two way ALE plot for income and female status for the model. Image by author

By examining trees in the XGBoost model, I confirm that there are decision paths that involve the female feature but not the income feature. Therefore, stereotyping risks remain.

Unlike the random forest model, I see evidence that XGBoost model incorporates spurious interactions. When the Friedman’s H statistic [11] is used to screen interactions, I see a relatively large value for female status with loan amount. ALE plots also suggest this interaction occurs (not shown). Loan amount is correlated with income (Spearman coefficient 0.44 for unbiased income, 0.36 for biased). By incorporating this interaction, the XGBoost model may be up-weighting a related feature for females to compensate for an unreliable income. However, this is an indirect correction for feature bias and is likely to be inaccurate for some cases.

Fairness metrics such as false positive and negative rates improve when the sensitive feature is incorporated into an XGBoost model. For example, the rate of false positives for females was nearly 31% higher than for males for the model not including female status, which decreased to about -2% when the feature was incorporated. The raw rate of false positives was similar for models with and without the sensitive features. In contrast, for the random forest test, addition of the sensitive feature increased overall false positive rates significantly; although the gap between males and females narrowed somewhat, error rates for both genders increased.

Based on the degree of under-correction and modest or nonexistent improvements to fairness and performance metrics, I had concluded that addition of a sensitive feature was probably not better than nothing for my random forest example [1]. However, large improvements in population-level metrics for the XGBoost model might justify this solution, at least for my scenario.

In sum, for my simple example, using an XGBoost model increases the effectiveness the sensitive feature inclusion technique, compared to a random forest model. However, stereotyping risks remain, and I see unexpected interactions. I only tested two model types for this project. It may be that other types would better incorporate pairwise interactions.

Create an Explicit Interaction Term

A possible modification to the data set would be to create a feature that is the product of income and female status; the feature value is 0 for all males and the same as the annual income for females. The hope is that modeling with an explicit interaction would reduce the risk of main-effect-like contributions. However, in a tree-based model, the interaction feature can easily be incorporated as an indicator when split points are near zero, so it’s unclear that this benefit would occur.

When I test the interaction feature in the random forest model, demographic parity improves slightly compared to using a feature indicator. The male-female gap in predicted defaults decreases from 1.1% to 0.8%; predictions are still under-corrected. The behavior of fairness metrics is (in my opinion) slightly better for random forest models with the explicit interaction feature compared to models using the female indicator. The false positive rate improves, and the gap between males and females decreases. False negatives rise overall but become more equal.

For XGBoost, there is no meaningful improvement in demographic parity or fairness metrics when using an interaction term instead of the female indicator.

Identification of potential main effects of female status is difficult this scenario, as we don’t have a convenient ALE plot for the “gender part” of the interaction. Inspection of trees provides some information, as decision paths that do not contain income but that split on the income interaction with a very low threshold value are likely to reflect main-effect-like contributions or spurious interactions.

For random forest, decision paths not involving income, but containing the interaction feature with a threshold of less than $5,000, occur about 11 times in the average tree. This is 1.6% of paths, or slightly less than the 2.4% that occurred in the model that used the female indicator [1]. The XGBoost version of the model also shows such decision paths.

Overall, incorporation of the income-female interaction feature results in a model that at a high level resembles the model involving the female status indicator. For the random forest case, the interaction solution has marginally better performance, but significant under-correction and risks of unintended consequences remain. In addition, interaction feature solution is more difficult to interpret.

Build a Separate Model for Each Group

Another suggested feature bias solution is to create separate models by group [3]. I didn’t test this option, but I expect it would perfectly fix feature bias for my scenario. However, this fix depends a lot on how I set up my example.

First, I have reasonable counts in both the male and female groups. In real data, some groups may be significantly under-represented, in which case models may be very different, or perform poorly, for low-volume groups. Representation levels in the data matter for the other solutions also, but separate models are particularly vulnerable.

Like the sensitive feature incorporation technique, separate models become risky in scenarios that are more complex than my test. Female status acts only via the income feature in my example, and so I can reasonably expect that if I build two models, they would be very similar except for that feature. However, if female status is correlated with other features, or with a causal effect not present in the data, we could wind up with very different models by group, which could be difficult to justify.

Final Thoughts

I have briefly discussed five strategies for mitigation of feature bias. I hope and imagine there are additional techniques I have not considered (please share in the comments).

As I wrote in my previous post [1], there is no “free lunch” for feature bias. Some solutions may be infeasible in some contexts, and data set characteristics and model type strongly influence effectiveness. For one test scenario, I’ve demonstrated much better mitigation for sensitive feature inclusion in an XGBoost vs. in a random forest model. In addition, I saw that introducing an interaction term is a slightly better mitigation than sensitive feature inclusion for random forest, but both methods were equivalent for XGBoost. Details of machine learning algorithms may help suggest solutions, but at this point I would likely rely on trial, error, and testing to select a technique.

Thankfully, explainability techniques and fairness metrics can help answer questions such as: To what extent is a solution correcting for feature bias? Who might benefit or be at risk? Which features are affected by our changes? Is this technique better than doing nothing? In this way, we can make informed decisions and anticipate negative consequences, even in the absence of perfect fixes.

References

[1] V. Carey, No Free Lunch for Feature Bias (2021), Towards Data Science

[2] M. Sieverding, A.L. Arbogast, S. Zintel, and C. von Wagner, Gender differences in self‐reported family history of cancer: A review and secondary data analysis (2020), Cancer Medicine, 9:7772–7780.

[3] S. Corbett-Davies and S. Goel, The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning (2018),Working paper (arXiv.org).

[4] Will Douglas Heaven, Predictive policing algorithms are racist. They need to be dismantled (2020), MIT Technology Review

[5] Jennifer L Skeem and Christopher Lowenkamp, Using Algorithms to Address Trade-Offs Inherent in Predicting Recidivism (2020),Behavioral Sciences & the Law, Forthcoming

[6] Nancy A. Mathiowetz, Charlie Brown, and John Bound, Chapter 6: Measurement Error in Surveys of the Low-Income Population (2002), Welfare Populations: Data Collection and Research Issues, Edited by Michele Ver Ploeg, Robert A. Moffitt, and Constance F. Citro

[7] Scott Lundberg, Explaining Measures of Fairness (2020), Towards Data Science.

[8] V. Carey. GitHub Repository, https://github.com/vla6/Stereotyping_ROCDS.

[9] V. Carey, Fairness Metrics Won’t Save You from Stereotyping (2020), Towards Data Science.

[10] C. Molnar, 5.3 Accumulated Local Effects (ALE) Plot(2018), Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.

[11] C. Molnar, 5.4 Feature Interaction (2018), Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.