An intuitive mathematical introduction

[In my last article, I tried scratching the surface of understanding some different reasons of biased datasets in NLP. Feel free to go and take a look, as this article builds upon it!]
As seen earlier, datasets tend to get biased when certain terms get associated with one particular label. The models that we train over such datasets start to capture this association, and behave poorly when the context of these terms are inverted. For example, a model that has seen the usage of term kill in ‘hateful’ tweets, will be biased to predict any new tweets containing this term as ‘hateful’ instead of ‘non-hateful’.
In this article, I will be taking a step further by isolating terms in a dataset that are most likely to introduce bias.
2 metrics to identify biased terms
I will be going through two common derivatives of mutual information to quantify the correlation of terms with labels.
Pointwise mutual information (PMI)

Let’s dissect this equation one-by-one and try to understand why PMI can be helpful to find term-label correlation:
- p(term|label): gives the probability by which we see the term occur in samples( or tweets or documents, pick your favorite) belonging to the label
- p(term): gives the probability of seeing a term in any sample (that is, across all labels)
- log: let’s simplify the log for better understanding

There can be 2 cases of interest
- p(term|label) > p(term): when the term is more probable to be seen in the label’s samples as compared to the whole dataset or the term is positively correlated with the label. In this case the resulting PMI will also be positive
- p(term|label) < p(term): when the term is less probable to be seen in the label’s samples as compared to the whole dataset or the term is negatively correlated with the label. In this case the resulting PMI will be negative
Hence, we see that PMI captures the effect of term-label correlation.
Let’s understand with some examples.

- "worst" gets 0 PMI values for both labels as its occurrence does not give any information towards what the label could be
-
"stupid" gets a positive value for the hate label as it occurs exclusively in that label a negative value for the non-hate label as it never occurs in that label

- "worst" gets a lower PMI than "stupid" for hate label due to occurrence in lesser hate samples; both terms get positive values for this label as both are more correlated to it versus the other label
- "worst" gets a higher PMI than "stupid" for non-hate label due to "stupid" being more strongly correlated with hate label

Hold on, do you see a problem with this example? We get equal scores for both "worst" and "stupid", as they are both perfectly correlated with the hate label. But I think this isn’t fair as "worst" occurs only once in the whole dataset! This is bad because a term can end up being in the top Bias terms just by having a single occurrence in the whole dataset.

As a concrete example, I find the top-15 biased terms for hate label on a Hate Speech Detection dataset (Hateval) from (Basile et al., 2019). On inspection, I found that all of these terms carry the same PMI value just because they occur exclusively for this label. Notice that the list contains many misspellings as well, which are highly likely to occur just once across the whole dataset!
Local mutual information (LMI): A reweighed PMI
We just saw a big caveat of PMI: It can fail to account for term frequencies, giving high weights to terms that hardly occur in the dataset, like many misspellings.
The solution is to reweigh PMI values.

- p(term,label): is the adjustment factor

Notice how the denominator for every term is a constant, but the numerator directly depends on the term’s frequency. This is the missing piece we were looking for.

Readjusting Example-3 using LMI, now the term "worst" is correctly assigned a much lesser value than the more frequent "stupid".

Applying LMI on the previous concrete example, top biased terms for the hate label are actually so hateful that I had to sensor them out 👶
What’s next
In this article we saw how to isolate bias-inducing terms in our NLP datasets. Now that we know who our enemy is, we can start planning on how to defeat it ⚔️