Overview of tokenization algorithms in NLP

Introduction to tokenization methods, including subword, BPE, WordPiece and SentencePiece

Ane Berasategi
Towards Data Science
8 min readAug 12, 2020

--

Photo by Hannah Wright on Unsplash

⚠️ READ THE ORIGINAL POST IN MY BLOG ⚠️

This article is an overview of tokenization algorithms, ranging from word level, character level and subword level tokenization, with emphasis on BPE…

--

--