Perplexity Intuition (and its derivation)

Never be perplexed again by perplexity.

Aerin Kim
Towards Data Science

You might have seen something like this in an NLP class:

A slide from Dr. Luke Zettlemoyer’s NLP class

Or

A slide of CS 124 at Stanford (Dr. Dan Jurafsky)

During the class, we don’t really spend time to derive the perplexity. Maybe perplexity is a basic concept that you probably already know? This post is for those who don’t.

In general, perplexity is a measurement of how well a probability model predicts a sample. In the context of Natural Language Processing, perplexity is one way to evaluate language models.

But why is perplexity in NLP defined the way it is?

If you look up the perplexity of a discrete probability distribution in Wikipedia:

where H(p) is the entropy of the distribution p(x) and x is a random variable over all possible events.

In the previous post, we derived H(p) from scratch and intuitively showed why entropy is the average number of

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Responses (11)

What are your thoughts?

YOU HAVE BECOME ONE OF MY FAVORITE!!!!!!
THE WAY YOU EXPLAINS IS THE WAY EVERYONE SHOULD EXPLAINS. THANK YOU.
2 ARTICLES AND NOW I AM A FAN.

--

Enjoyed reading this one. I am a fellow student as well in UW this fall with you.

--

Amazing post. It helps me to connect the dots. Thank you Aerin, I really appreciate it.

--