fairmodels: let’s fight with biased Machine Learning models

part 3: mitigation

Jakub Wiśniewski

Published in

Towards Data Science

6 min readAug 24, 2020

TL;DR

The R Package fairmodels facilitates bias detection through model visualizations. It implements a few mitigation strategies that could reduce bias. It enables easy to use checks for fairness metrics and comparison between different Machine Learning (ML) models.

Long version

Bias mitigation is an important topic in Machine Learning (ML) fairness field. For python users, there are algorithms already implemented, well-explained, and described (see AIF360). fairmodels provides an implementation of a few popular, effective bias mitigation techniques ready to make your model fairer.

I have a biased model, now what?

Having a biased model is not the end of the world. There are lots of ways to deal with it. fairmodels implements various algorithms to help you tackle that problem. Firstly, I must describe the difference between the pre-processing algorithm and the post-processing one.

Pre-processing algorithms work on data before the model is trained. They try to mitigate the bias between privileged subgroup and unprivileged ones through inference from data.
Post-processing algorithms change the output of the model explained with DALEX so that its output does not favor the privileged subgroup so much.

How do these algorithms work?

In this section, I will briefly describe how these bias mitigation techniques work. Code for more detailed examples and some visualizations used here may be found in this vignette.

Pre-processing

Disparate impact remover (Feldman et al., 2015)

(image by author) Disparate impact removing. Blue and red distribution are transformed into “middle” distribution.

This algorithm works on numeric, ordinal features. It changes the column values so the distributions for the unprivileged (blue) and privileged (red) subgroups are close to each other. In general, we would like our algorithm not to judge on value of the feature but rather on percentiles (e.g., hiring 20% of best applicants for the job from both subgroups). The way that this algorithm works is that it finds such distribution that minimizes earth mover’s distance. In simple words, it finds the “middle” distribution and changes values in this feature for each subgroup.

Reweightnig (Kamiran et al., 2012)

(image by author) In this mockup example, S=1 is a privileged subgroup. There is a weight for each unique combination of S and y.

Reweighting is a simple but effective tool for minimizing bias. The algorithm looks at the protected attribute and on the real label. Then, it calculates the probability of assigning favorable label (y=1) assuming the protected attribute and y are independent. Of course, if there is bias, they will be statistically dependent. Then, the algorithm divides calculated theoretical probability by true, empirical probability of this event. That is how weight is created. With these 2 vectors (protected variable and y ) we can create weights vector for each observation in data. Then, we pass it to the model. Simple as that. But some models don’t have weights parameter and therefore can’t benefit from this method.

Resampling (Kamiran et al., 2012)

(image by author) Uniform sampling. Circles denote duplication and x’s omitting of observation.

Resampling is closely related to the prior method as it implicitly uses reweighting to calculate how many observations must be omitted/duplicated in a particular case. Imagine there are 2 groups, deprived (S = 0) and favored (S = 1). This method duplicates observations from a deprived subgroup when the label is positive and omits observations with a negative label. The opposite is then performed on the favored group. There are 2 types of resampling methods implemented- uniform and preferential. Uniform randomly picks observations (like in the picture) whereas preferential utilizes probabilities to pick/omit observations close to cutoff (default is 0.5).

Post-processing

Post-processing takes place after creating an explainer. To create explainer we need the model and DALEX explainer. Gbm model will be trained on adult dataset predicting whether a certain person earns more than 50k annually.

library(gbm)
library(DALEX)
library(fairmodels)data("adult")
adult$salary   <- as.numeric(adult$salary) -1
protected      <- adult$sex
adult <- adult[colnames(adult) != "sex"] # sex not specified

# making model
set.seed(1)
gbm_model <-gbm(salary ~. , data = adult, distribution = "bernoulli")

# making explainer
gbm_explainer <- explain(gbm_model,
                         data = adult[,-1],
                         y = adult$salary,
                         colorize = FALSE)

Reject Option based Classification (pivot) (Kamiran et al., 2012)

(image by author) Red- privileged, blue- unprivileged. If the value is close (-theta + cutoff, theta + cutoff) and particular case, the probability changes place (and value) to the opposite side od cutoff.

ROC pivot is implemented based on Reject Option based Classification. Algorithm switches labels if an observation is from the unprivileged group and on the left of the cutoff. The opposite is then performed for the privileged group. But there is an assumption that the observation must be close (in terms of probabilities) to cutoff. So the user must input some value theta so that the algorithm will know how close must observation be to the cutoff for the switch. But there is a catch. If just the label was changed DALEX explainer would have a hard time properly calculating the performance of the model. For that reason instead of labels, in fairmodels implementation of this algorithm that is the probabilities that are switched (pivoted). They are just moved to the other side but with equal distance to the cutoff.

Cutoff manipulation

(image by author) plot(ceteris_paribus_cutoff(fobject, cumulated = TRUE))

Cutoff manipulation might be a great idea for minimizing the bias in a model. We simply choose metrics and subgroup for which the cutoff will change. The plot shows where the minimum is and for that value of cutoff parity loss will be the lowest. How to create fairness_object with the different cutoff for certain subgroup? It is easy!

fobject <- fairness_check(gbm_explainer,
                          protected  = protected, 
                          privileged = "Male",
                          label      = "gbm_cutoff",
                          cutoff     = list(Female = 0.35))

Now the fairness_object (fobject) is a structure with specified cutoff and it will affect both fairness metrics and performance.

The tradeoff between fairness and accuracy

If we want to mitigate bias we must be aware of possible drawbacks of this action. Let’s say that Statical Parity is the most important metric for us. Lowering parity loss of this metric will (probably) result in an increase of False Positives which will cause the accuracy to drop. For this example (that you can find here) a gbm model was trained and then treated with different bias mitigation techniques.

The more we try to mitigate the bias, the less accuracy we get. This is something natural for this metric and the user should be aware of it.

Summary

Debiasing methods implemented in fairmodels are certainly worth trying. They are flexible and most of them are suited for every model. Most of all they are easy to use.

Learn more

Check the package’s GitHub website for more details
Tutorial on full capabilities of the fairmodels package
Tutorial on bias mitigation techniques