Perplexity Intuition (and its derivation)

Never be perplexed again by perplexity.

Aerin Kim

Published in

Towards Data Science

4 min readOct 11, 2018

You might have seen something like this in an NLP class:

A slide from Dr. Luke Zettlemoyer’s NLP class

A slide of CS 124 at Stanford (Dr. Dan Jurafsky)

During the class, we don’t really spend time to derive the perplexity. Maybe perplexity is a basic concept that you probably already know? This post is for those who don’t.

In general, perplexity is a measurement of how well a probability model predicts a sample. In the context of Natural Language Processing, perplexity is one way to evaluate language models.

But why is perplexity in NLP defined the way it is?

If you look up the perplexity of a discrete probability distribution in Wikipedia:

where H(p) is the entropy of the distribution p(x) and x is a random variable over all possible events.

In the previous post, we derived H(p) from scratch and intuitively showed why entropy is the average number of…

Perplexity Intuition (and its derivation)

Never be perplexed again by perplexity.

But why is perplexity in NLP defined the way it is?

Create an account to read the full story.

Published in Towards Data Science

Written by Aerin Kim

Responses (11)