Apache Spark for Data Science — Word Count With Spark and NLTK

Learn to count words of a book and address the common stop word issue — implemented in PySpark

Dario Radečić
Towards Data Science
7 min readApr 16, 2022

--

Photo by Warren Wong on Unsplash

Do you know what’s the most common beginner exercise in Apache Spark? You’ve guessed it — it’s word counts. The idea is to grab a text document, preferably a long one, and count the occurrences of each…

--

--