Detecting Document Similarity With Doc2vec

A step-by-step, hands-on introduction in Python

Omar Sharaki
Towards Data Science
12 min readJul 10, 2020


“assorted berries” by William Felker on Unsplash

There is no shortage of ways out there that we can use to analyze and make sense of textual data. Such methods generally deal with an area of artificial intelligence called Natural Language Processing (NLP).

NLP allows us to perform a multitude of tasks where our data consists of text or…



Software developer, standup comedian, and guy you wouldn’t mind sitting next to on a plane.