Perplexity Intuition (and its derivation)
Never be perplexed again by perplexity.
You might have seen something like this in an NLP class:
Or
During the class, we don’t really spend time to derive the perplexity. Maybe perplexity is a basic concept that you probably already know? This post is for those who don’t.
In general, perplexity is a measurement of how well a probability model predicts a sample. In the context of Natural Language Processing, perplexity is one way to evaluate language models.
But why is perplexity in NLP defined the way it is?
If you look up the perplexity of a discrete probability distribution in Wikipedia:
where H(p) is the entropy of the distribution p(x) and x is a random variable over all possible events.
In the previous post, we derived H(p) from scratch and intuitively showed why entropy is the average number of…