
Modern Machine Learning (ML) algorithms are achieving remarkable results thanks to the continuous development of more and more complex architectures that are able to identify patterns that are beyond human understanding. However, these results come with a price to be paid: understanding machine learning predictions is becoming increasingly difficult and in some cases even impossible.
Understanding machine learning predictions is becoming increasingly difficult and in some cases even impossible
On the other hand, as humans, we naturally trust what we can understand and given that certain application domains require humans to make decisions based on ML predictions (e.g. Fraud Detection, credit risk assessment), it becomes clear how important interpreting black-box ML models is.
As humans we naturally trust what we can understand
The Algorithm
This post proposes an approach to model-agnostic (i.e. does not rely on a specific algorithm) ML interpretability which relies on the well-known yet fascinating concept of Genetic Algorithms.
The underlying idea is, given a specific sample which prediction has to be explained, to generate a new synthetic sample which is as similar as possible to the original one but with a significantly different prediction. We can ideally explain the reason why the model is producing a specific prediction by looking at the differences between the original and the synthetic sample. This approach is frequently referred to in the literature as Adversarial and Counterfactual Explanations.
Similar input features but different predictions
Let’s now look at a simple example to clarify the idea. Suppose to have a machine learning classifier which is able to discriminate between frauds and legit items published on an online housing marketplace (you can think of something similar to AirBnB).
All of a sudden, a new item just published on the platform is identified by the classifier as a possible scam. In this example, the item under consideration represents an entire house in the city of Eindhoven rented for 600€ a month.

It might not be easy to understand what makes this item suspicious and that’s why ML explanations are needed.
Let’s now suppose to have a magic tool which is able to generate a new synthetic item that, despite being really similar to the original one, is evaluated by the same classifier as legit.

As it is possible to see, very similar input features originate completely different outcomes. It now becomes intuitive that the explanation, namely the driving factor(s) of the suspiciousness, can be generated by comparing the two instances as follows:

So far so good, but how can we generate this new sample to be used for comparison and the consequent explanation? Well, that’s were Genetic Algorithms come in handy.
Genetic Algorithms
Genetic Algorithms (GA) is a heuristics-based optimization technique that simulates Darwin’s evolution theory on a specific domain.
The key components of the algorithm are:
- Population: a set of instances that goes through the iterative process.
- Fitness: evaluation function which measures how much a sample adapts to the environment and consequently how likely it is to survive to the next iteration.
- Elitism: phenomena that preserve the best candidates from one iteration to the next one.
- Crossover: the process of creating a new sample for the next generation by mixing the genes (i.e. features) of two parents samples from the current iteration.
- Mutation: random phenomena that alter samples’ features.
With this in mind, a general optimization procedure works as follows:
- Initializes a random population of N instances
- Evaluates the fitness values of all samples belonging to the population
- Through elitism, brings the best candidates over to the next generation
- Performs crossovers and generates new samples until the next generation’s population is complete
- Applies random mutation to the new population
- Repeats steps 2 – 5 for each iteration
The Implementation
Fitness Function
To solve our optimization problem, the fitness function must enforce that the final synthetic instance matches the two conditions stated before:
- similar in the input features to the sample which prediction we need to explain
- significantly different prediction from the one to be explained
Input Features
Overall samples similarity can be divided into:
- similarity among numerical features, for example, using cosine similarity
- similarity among categorical features, for example, using Jaccard similarity
The higher the features similarity, the higher the fitness value
An example of this function can be defined as follow:

where X, X’ represent respectively the input features of the original instance which prediction has to be explained and the ones belonging to a synthetic sample.
Predictions
On the other hand, this contribution to the fitness value of a sample is given by the difference between the prediction to be explained and the one generated by the same ML model over the synthetic sample. Here, the term prediction represents the output of the ML algorithm, namely either a probability if we are dealing with a classifier, or a continuous value if we are solving a regression problem.
The larger the prediction discrepancy, the higher the fitness value.
By taking as an example a binary classification problem, this part of the fitness function can be implemented as follows:

where Δmax is the theoretical maximum discrepancy achievable given the desired probability for synthetic samples t and p(X’) represents the predicted probability generated by the classifier over the synthetic sample _X’._If, for example, we are interested in explaining the prediction of an instance belonging to the positive class, these formulas can be simplified as:
The final fitness value can be obtained as the combination of the individual components, for example in the shape of a weighted sum:
Explanations Generation
Once the optimization procedure is completed, the candidate with the highest fitness value of the final population is used as a counterfactual example to generate the explanation.
The candidate with the highest fitness value represents the counterfactual instance
However, the process is not finished yet. In order to make the counterfactual comparison easily comprehensible for humans, instances must be summarised by using just a meaningful subset of features, those that are most important for the explanation. Such importance has been established by using an intuitive heuristic: the more a feature assumes values that are different from the original one in the final population, the more it is important.
The more a feature has changed, the more important it is
Mathematically, this concept has been encoded using the following formulas:
Where X is the original instance, X’ is a synthetic sample, f is a feature, |f| is the cardinality of feature |f|(i.e. the number of different values of that feature), and P represents the population at the end of the evolutionary process. As a result, g is a simple binary function that evaluates to 1 only when a feature assumes different values between the original instance and the synthetic one. The final importance imp is then calculated by counting how many times each feature has changed from its original value and then normalized by a cardinality factor (high cardinality features are more likely to assume different values). Please note, when dealing with high cardinality numerical features, in order to have meaningful importance values, a discretization step of those features may be needed.
Final explanations are then generated as a list of counterfactual statements, one for each of the most important features. Each statement is composed of the feature’s name, the feature’s value in the original instance, and the feature’s value in the best synthetic sample.
How does it look in practice?
Let’s now look at a business case where a ML classifier is deployed to perform Fraud Detection in online marketplaces[1]. In this scenario, given the sensitivity of the domain, we are not only interested in the final classification outcome but also in understanding what are the driving factors of the suspiciousness that AI has identified.
The following is a real example of how ML explanations generated with this approach look:
The model thinks this item is suspicious because:
- It has been published right after user's registration INSTEAD of after a week
- Item's Price is 200$ cheaper than average INSTEAD of having a value similar to average
- Item's description is empty INSTEAD of being at least 1000 characters long
- It has been published using an unknown O INSTEAD of Mac-OS
- It has been posted in July INSTEAD of January
Conclusions
To conclude, here is a summary of our process and how we implemented and evaluated this approach with respect to the Fraud Detection business case mentioned above:
At first, ML explanations proved to enhance human’s ability in understanding predictions, resulting in a faster decision-making process and better accuracy of the decisions made[1];
On top of that, Adversarial Explanations generated using the evolutionary approach proposed here achieved better performances[1] than LIME[2] and comparable performances to SHAP[3], widely acknowledged as the to-go algorithm when it comes to model-agnostic ML explanations.
Credits
I am thankful to HousingAnywhere for giving me the opportunity to develop this algorithm as part of my Data Science internship in the company. Special thanks to Massimo Belloni for his contribution and the support throughout the whole project.
References
[1] F. Piccinini, M. Belloni, E. Della Valle and M. Pechenizkiy, Enhancing Fraud Detection Through Interpretable Machine Learning (2020), Politecnico di Milano & Eindhoven University of Technology [2] M. T. Ribeiro, S. Singh, and C. Guestrin. "why should I trust you?": Explaining the predictions of any classifier (2016). In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [3] S. Lundberg, S. Lee, A Unified Approach to Interpreting Model Predictions (2017), NeurIPS