Model Interpretability
By Ofir Magdaci, Data Scientist at Wix.com

In the previous post, we got to know a word-embedding representation of the global language of football. In this article, we will deal with developing explainers on top of these embeddings, understanding what aspects the model was able to capture, and elaborate on its results.
All code used for this work is available on the Football2Vec library on Github (mainly the [explain](https://github.com/ofirmg/football2vec/blob/master/lib/explain.py)ers and explain modules).
Prior knowledge
This work is heavily based on my first post – Embedding the Language of Football Using NLP. It is recommended to read it first as it provides context.
The dataset
The data used in this work is based on the Statsbomb open dataset. Each match in the dataset consists of team metadata, competition metadata (e.g., stage, stadium, etc.), and most importantly, manually collected and labeled event data. Documentation is available on the dataset’s GitHub repository.
Motivation & recap
In the previous post, I presented Action2Vec, a Word2Vec model which allowed us to embed the semantics of the football language in a 32-dimensional space. In addition, I introduced PlayerMatch2Vec, a Doc2Vec model which produces 32-sized vectors representing a player within a specific match. Finally, I presented the Player2Vec model using a simple averaging over the PlayerMatch2Vec representations. Here is how it looked like:
In this article, we will dig deeper into these models. We will explore various techniques to understand them, analyze them and explain their outputs. Our efforts will focus both on explaining specific results and on the model as a whole. But before we do that, let’s first grasp what explainability means and its importance to machine learning models. Wikipedia addresses it in the context of artificial intelligence:
Explainability: "Explainable Ai is artificial intelligence (AI) in which the results of the solution can be understood by humans. It contrasts with the concept of the ‘black box’ in machine learning where even its designers cannot explain why an AI arrived at a specific decision."
Explainers can serve a crucial role in decision making, debugging, or detecting bias, especially for unsupervised problems, such as ours (since we don’t have a downstream task). It is a great way to check what the models learned, and whether these aspects are relevant to the domain. But perhaps the strongest motivation of all lies in the fact that most people will not trust a model they cannot explain or understand.
Complex models such as Doc2Vec can capture patterns that simple models, such as Logistic Regression (LR), can’t. However, LR is a much more explainable model, allowing direct access to the features’ coefficients and understanding their importance in any prediction. Closing this explainability gap is what this post is all about.
It is common to divide the explainers into two main types:
- Local ** explainers, which aim to explain individual predictions or outputs**.
- Global ** Explainers** which describe a complete behavior of the model. It sheds some light on the big picture, validating the model as a whole.
I will focus on four explainability methods, both local and global, that I found the most informative and reliable in practice: representation-based explainers, analogies, similarities, and creating players’ variations.
Apart from explaining predictions, using these methods, we will be able to build complex profiles for targeting players in the transfer market. For example, we can search for a player like Antoine Griezmann, but accomplish more dribbles, or like Felipe Countinho with better play on his weaker foot.
Understanding representation’s dimensions semantic meaning
First, we will inspect the players’ vectors and explore possible semantic patterns. Since each dimension is somewhat associated with players’ attributes, it is possible to infer their latent meaning by comparing players with high and low values in each of the 32 dimensions.
Some patterns are quite visible. For example, dimension #21 seems to be correlated with some attacking aspects of the game. However, the best practice will be to involve a domain expert with much deeper and more extensive knowledge about the football domain in this process.
This method acts as a global explainer for how the model places players in the semantic space. This explainer’s output is a set of explainable global features we can use to infer the player’s style of play or to understand similarities between players.
Explainers for actions – actions analogies
Word analogies: explaining actions with actions
An analogy is an operation that describes a semantic relation between words within a language. This relation stands for a semantic attribute incorporated within the language. So, in a sense, ** analogies can serve as a tool to investigate model semantic**s.
The basic structure of an analogy is given by: Word A1 → Word A2 ~ Word B1 → Word B2. Meaning, Word A1 to A2 has the same relation as B1 to B2. All words which fit the relation are semantically similar.
One famous analogy in English language models is "King to queen ~ Male to female". This analogy demonstrates the relation of gender in the English language as well as the status of royalty. This analogy cannot be mimicked in languages that ignore gender – for example – our language of football. Mathematically, the analogy implies that the distance between king and queen is about the same as between male and female.
What aspects of football was the model able to capture? Let’s use analogies to find out! I will explain the rationale for the first example and use the same notations for all the other examples. To clarify, all analogies are NOT cherry-picked.
Full analogy example: pass direction learning
As mentioned, each analogy with the form of A to A’ ~ B to B’, defines a relation, where ‘to’ stands for distance, calculated by subtracting the vectors. In this example, the relations A – A’ and B – B’ represent a transformation of the pass angle. Using the analogy, we are essentially asking what will be the result (according to the model) if we apply the same transformation as in A to A’ on another given action, B. Hence, we can describe the analogy as follows: A – A’ ~ B -?
For the sake of the analysis, we will go the other way around: given the relation A – A’, and an action B’, we would like to find what was the original action B, using the same relation. After some basic algebra, the analogy is met when looking for the most similar action to the phrase: A – A’ + B’.
In the case of pass direction analogy, we will select a random pass A from our vocabulary, and extract the same pass but with the opposite direction – A’. This single change reflects the analogy relation. We will select a second random pass with the same direction of A’, that is B’.
- A = ‘(3/5,3/5):( → )|ground-long|left_foot’ – a ground pass to the right from the middle of the pitch.
- A’ = ‘(3/5,3/5):( ←)|ground-long|left_foot’ – same pass, but to the left.
- B’ = ‘(2/5,3/5):( ← )|low-long|left_foot’ – a ground pass to the right from the middle of the pitch.
Given these inputs, the three most similar actions to the analogy, i.e., the best actions to fit B are:
- ‘(2/5,3/5):( → )|high-med|left_foot’, cosine similarity=0.761 – a high pass to the right, from the same position as B’.
- ‘(2/5,3/5):( → )|high-long|left_foot’, cosine=0.760 – a high long pass to the right, from the same position as B’.
- ‘(2/5,3/5):( → )|ground-long|left_foot’, cosine=0.748 – a ground long pass to right, from the same position as B’.

So what do we learn from this? First, it seems that the best fit, i.e., the action with the highest similarity, was the same pass with greater distance and a more forward direction. The second best fit is the forward ground shorter pass, with the left foot. The third most similar word is actually a mirrored pass with the opposite direction and the opposite leg. Interesting indeed!
But wait, is it a problem that the mirrored action, which is expected to be the most intuitive to fit the analogy, was only ranked third? Well, in my opinion, not at all:
- Many actions appear very few times in the data, making it difficult to properly place them in the semantic space. Honestly, it is not a real data-science article if it doesn’t say at least once that "more data will yield even better results".
- All three candidates may fit this analogy well as they achieved very high cosine similarity values, outmatching thousands of other pass actions.
- The pass direction is determined by a simple heuristic and is pointed to the middle of the direction category. Different methodologies may lead to different results.
Overall, we can conclude that the pass direction semantics are well captured by the model. So now that we understand how analogies are built, let us review many more.
Moving the ball forward learning
This analogy aims to capture the basic notions of pushing the ball forward, closer to the goal. Analogy relation: pass → the same pass from a more advanced position (closer to the goal).

I find it quite fascinating to see how the action position and type are always properly captured, while additional attributes vary slightly across different candidates.
Foot analogy
Analogy relation: pass with right foot → same pass with the left foot.

For simplicity, I removed the technical attributes of the passes. The model always respects them when finding similar tokens, so just assume these analogies hold for any technique and type.
Pass height analogy
Analogy relation: pass with lower height → same pass with greater height.

Context understanding: Understanding goals
Analogy relation: a shot that was saved → the same shot with goal outcome.

This one really blew my mind. I thought that A to A’ reflects making a bad shot better. Well, that is one way to describe it. Shots taken within the box are often saved and scored with a rebound. So, in a way, it is also a good way to assist (many FIFA gamers will find it familiar 😉 ). So potential assists can fit this analogy, as well as missed lob shots.
So far so good, but we are just getting started.
Explainers for players #1 – explaining players with players analogies
As we did for actions, we can produce player-to-player analogies. Since documents represent random variables where each is a player within a specific match, the analogies themselves are random variables. Confusing as it sounds, I, therefore, sampled matches of each player in the analogy and repeated the analogy ten times. Results were highly consistent.
Defender to striker analogy
We can start by examining a simple relation of a defender to an attacker by calculating the distance between such players. We expect this distance will be similar across different pairs of center backs and forwards.
Pique (center-back) – Suarez (stiker) + Benzema (striker) ~?
Three top matching players with at least 5 matches: Maxwell Cabelino (left-back), Jordi Alba (left-back), and Javier Mascherano (center-back).
Center-to-flank analogy
We can compare players in similar positions, one in the center and one at the flank. For example, from center defense to flank:
Alba (left-back) – Pique (center-back) + Ramos (center-back) ~ ?
Best matches: Daniel Alves (right-back), Lucas Digne (left-back).
Style and skill analogies
To this end, we will compare players in similar positions but with different styles of play or skills.
For example, I chose Antoine Griezmann, a Forward player in the La Liga, and measured his distance from Ousmane Dembele – also an attacker, but more winger-oriented. The differences between them can be easily pointed in the data either by their average positioning on the pitch, by the number of dribbles per match, etc. Then, I took Neymar da Silva Santos Júnior, who I (and the model) find more similar to Dembele than Griezmann. The resulting analogy is: Griezmann – Dembele + Neymar ~ ?.
Best match: Pedro Rodríguez – a forward (during his time in Barcelona). I am impressed, but I’m also biased. So – what do you think about it? Let me know!

Each analogy can really get its own deep dive, but we have really exciting things coming…right about now…
Explainers for players #2— Combining players with actions into player variations
So players analogies are nice, but let’s take it one step further. We have a massive collection of player representations and a very rich language to describe it – let’s mix them up.
Combining players with actions allows us to generate endless local variations for a player, across one or more skills. For example, we can create offensive variations with more shots or crosses, or enhance defensive skills by replacing bad tackles with successful ones. These variations can serve as explainers, similar to LIME.
Let’s take Toni Kroos for example. According to our model, he is less similar to Andres Iniesta compared to Cesc Fabregas. Why? By simulating different versions of Kross in the football space, each with a different modification, we can see that the two distinctive factors which separate him from Iniesta are dribbling and scoring. Sounds promising, but first, three comments:
- Results consistency – inferred representations by Doc2Vec of unseen documents during training is stochastic. Thus, I reproduced the process several times and took average results.
- It seems reasonable to filter results with low similarity (I filtered results with cosine similarity < 0.7) and players with lower similarity scores than the inspected player himself.
- Document design: The Player2Vec model is defined by a player and a match, and is averaged across matches to represent a player. During training, the model embeds native documents of real football possessions. To avoid garbage-in garbage-out on inference, we should supply meaningful, properly designed content on inference as well.
What operation can we apply on top of players’ documents to produce meaningful variations? How can we do it?
Simplistic approach – summing vectors
Since the Word2Vec and Doc2Vec models are linear in nature, we can produce player variations by merely summing a player representation vector and an ad-hoc document of a specific set of actions. For example, we can sample dribble actions and assemble them all in one document, following a sort of a bag-of-words approach.
In the first post, I demonstrated the usage of this method. This way I was able to conclude that Andres Iniesta + outbox scoring – dribbling ~ Toni Kroos.
Despite the promising results presented, designing these documents is a subtle process to navigate through. Thus, I will present two approaches I feel more comfortable with: modifying player actions and enriching player actions.
Modifying player actions
Here, we are iteratively looping over the player’s documents and execute a set of given interventions, each holds a pattern to follow, and a modifier function to transform the observed action. We execute this modification with probability p when conditions allow. We can follow this approach for creating endless forms of variations for players relating to many aspects of the game while preserving native document design.
Note: Since this approach relies on changing existing actions, informative variations require players to execute this action frequently enough during their matches.
For example, we can create variations of Frenkie de Jong, where his passes are converted to be high in altitude (Figure 8). For p=0.1, most similar players are Thiago, Fabregas, Arthur, Rakitic, etc. Cosine similarity values are > 0.95. As we increase the p, their similarity decreases while other players’ similarities, such as Robert Pirès’, increases. When p > 0.8 (drastic change rate in the player’s action), __ the most similar players are in fact goalkeepers, as they tend to use high balls frequently.
This modification is highly visible in the 2-D reduction (Figure 8) of our representation as well. As p is increased we can observe non-standard behavior, as the corresponding variations are distinct from all the native players. However, it is a great time for a reminder – it is merely a very compressed projection of the data, nothing more. The full knowledge of the representation is embedded within all 32 dimensions.
Figure 8 covers another variation, of Ousmane Dembele, with reduced usage of his right leg. The intervention pattern is right-footed actions, and the modifier function simply switches the action’s leg to the left leg (Figure 8). For p=0.1, the three most similar players are Pedro Rodríguez, Isaac Cuenca and Cristian Tello. However, for p=0.8, the three most similar players are Isaac Cuenca, Bojan Krkíc, and Gerard Deulofeu. Cristian Tello who also uses both his legs is now out of the top three.
This is how all these variations look like in 2-D, using our beloved UMAP projection:
<iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fplotly.com%2F%7Eofirmg%2F110.embed%3Fautosize%3Dtrue&display_name=Plotly&url=https%3A%2F%2Fchart-studio.plotly.com%2F%7Eofirmg%2F110%2F&image=https%3A%2F%2Fchart-studio.plotly.com%2Fstatic%2Fwebapp%2Fimages%2Fplotly-logo.8d56a320dbb8.png&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=plotly" title="Figure 8: Plotly interactive Player2vec [UMAP](https://umap-learn.readthedocs.io/en/latest/) projection of all players in the dataset, as well as selected players' variations. Players are colored by position, while some selected well-known players are colored by name, and can be filtered directly. Zooming and full-screen view are available for reading overlapping text. Each variation name follows the same naming pattern: _ _. Image by Author.” height=”400″ width=”600″>BTW, I also tried to improve players’ skill levels by transforming each failed action observed in the player’s documents to success with probability p. However, doing so didn’t produce any interesting findings. As I mentioned several times, our representation deals with what the player does, rather than how well he does it. This is a good validation of that statement. To address skill levels, the model has to be adjusted and extended to that purpose.
Enriching player actions
For cases we want to also change the action prevalence – we can add words from the relevant actions family or remove matching actions with chance p.
As before, we iteratively looping over the player’s documents and execute a set of given interventions, each holds a pattern to follow, and an enricher function to build the desired action to follow. We execute this enrichment with probability p when conditions allow, resulting in more occurrences of the skill’s related action.
What things we should care about when doing enrichments? If we want to add more **** dribbles to a player, for example, we should add them in matching locations and appropriate situations – after a ball receipt for instance. We would also like to avoid unrealistic scenarios such as dribbling inside your own 5-meters box.
To this end, I identified appropriate situations using a 1-gram model, which basically means that I modeled the probability of action to occur given the last action, as observed in the data. The probability, _pdribble, is the probability that a dribble appears after each action: 25% of dribbles occur after a Pass, 23% after a Ball Receipt, 20% after Carry, 13% after Pressure, and 4% after Ball Recovery. For simplicity, I decided to neglect the long tail of 15% for all other actions types and skip actions performed in the team’s own half.
So what happens if we will reduce the number of dribbles performed by a player? Figure 8 demonstrates also these variations of the talented dribbler Andres Iniesta. For p=0: most similar are Thiago, Arthur, and Fabregas (cosine similarity > 0.9). For 0.02 < p<0.2, the cosine values are still > 0.9 but the order has changed – Ricard Puig and Coutinho are most similar. Last, when 0.2<p<0.5: Arthur achieved 0.97 cosine similarity while all others failed to pass the 0.7 minimum threshold.
Explaining embeddings dimensions using players variations
We can use the variation also to learn about the representation dimensions. Specifically, we can calculate the variance of each of the vector dimensions, for all variation types. Dimensions with high variance values are actually the dimensions that changed the most across the different values of p, therefore, are the most associated with the applied modifications.

We can learn from Figure 9 that different skills affect different dimensions. This holds also when the skill is the same, but the intervention type is different – modification versus enrichment. Overall, enrichments cause a higher variance, i.e., a more significant effect on the representation.
In addition, we see that some skills have a higher absolute influence on the vector dimensions than others. For example, the order of magnitude for changing the player foot side is significantly lower compared to modifying the pass height. **** Mathematically, foot change had a much lower variance across the dimensions and a lower effect on cosine similarity results.
As I mentioned before, changing merely the action’s outcome (see ‘shot’ and ‘dribble’ skills in Figure 9) resulted in a very minor impact, barely separable from mere noise.
So we learned that some elements are more dominant in the representation than others. Extensions to the model or the preprocessing can mitigate such cases. For example, to empower the difference caused by changing shot outcome – we can add a separate word for goals and place them in appropriate positions. This will create a dependency the model can capture.
One more thing: Understanding the representation variance
Just before we reach the 90th minute of this post – do you remember my note from the first article that a player’s spread in space is likely to mean something as well? So, I decided to check it out, and this is how it looks like:
We should bear in mind it is merely a 2D projection (using our beloved UMAP) of a 32-dimensional representation. However, we can see a clear linear trend. Does this trend mean anything? As it turns out, it may.
I highlighted the names of some top-level players in various positions. Going with the trend, it is much more likely to encounter such players rather than going against it. Do these players change their behavior across different matches? Are they indeed better players? Of course, there is a selection bias, the data is imbalanced (therefore variance is affected), so there is much more to analyze before concluding. Meanwhile, you can explore it interactively.
Summary
In this post, we took a deep look under the hood of the language of football and the Player2Vec model, we got to know different types of explainers, both global and local, and we used analogies as a semantic measure tool, both on actions and players.
Then, we moved to more advanced methods of explaining players. Specifically, I introduced the notions of modifying and enriching players into players’ variations, allowing us to explain players’ representation, as well as similarities and distances between entities.
Which explainer to use and for what purpose is up to you, depends on the business needs. As I mentioned before, we can fill a book with examples, ideas, and different approaches – there are many other methods, some will probably do better service for some use-cases.
What’s Next?
So, I have to stop here. The next post will be mainly about player skill evaluation, including the fancy Streamlit dashboard I promised.
Feel free to contact me for any inputs, comments, or requests.
Final whistle.