
Beginnings are always the most difficult. You feel lost, you don’t know where to start and which way is best. That gets even more true if what you’re trying to get into is a complex and very broad field like data science. Not just broad, data science is one of those fields with an overwhelming amount of information that you could find online.
The most popular data science branch is natural language processing (NLP). NLP is a branch of computer science that is concerned with allowing computers to understand and use natural languages. The desire for computers to understand and communicate with us has been there since the invention of computers themselves.
However, in the past few years, computing has been advancing at a rapid pace, allowing technologies such as machine learning and big data to be a reality. Not just a reality, but daily used technology.
Because the popularity of NLP is growing only increasing, new people consider joining the field every day. One of the best resources to learn anything are Books. But, there are so many books about there about data science and NLP. So, which should you read?
This article will walk you through 6 amazing NLP books that will take you from an NLP newbie to an NLP expert in no time.
№1: Text Analytics with Python: A Practitioner’s Guide to Natural Language Processing
Text Analytics with Python: A Practitioner’s Guide to Natural Language Processing is a great book that teaches you how to set and build a robust environment that you can use for various text analytics tasks. Furthermore, the books present many techniques and models based on the recent research advances in NLP.
The book will take you from the absolute basics of NLP using Python to more advanced topics and real-life applications. It then goes in-depth into the theory, implementation, and use cases of several popular NLP applications, including parsing and processing tests, text summarization, topic modeling, and semantic analysis.
All the code base mentioned in the code can be accessed on GitHub for free. Perhaps this book’s highest advantage is that it takes you on the journey from start to building an actual NLP application. That’s why it works for IT professionals, developers, engineers, and analysts.
№2: NLTK Book
If you’re using Python to learn NLP, then at one point in your journey you must’ve passed through NLTK. NLTK (Natural Language Toolkit) is one of the most popular NLP libraries in Python.
The library has its own freely available book that covers all basics of NLP as well as the various functions and methods in the library and how you can use them to build real-life NLP projects. I should mention that this book is only meant to be an introductory book into NLP, it will not go deep into advanced algorithms and techniques.
Rather, it will focus on helping you build a strong foundation for parsing, processing, and analyzing text data. The book is updated to match NLTK 3 and Python 3.
№3: Speech and Language Processing
Speech and Language Processing is a book written by professors from Stanford University. The Book’s newest draft (Dec 2020) is available online. You can get and read all the book’s chapters online for free from the book’s official website.
This book covers all the basic speech and language process techniques from the very beginning to the more advanced topics. It starts with NLP basics, such as n-grams, text normalization, and regular expression, then it goes into regression, deep learning, neural network, and machine translation.
It also comes with draft slides that anyone can use if they intend to teach NLP. This book is the first in this list to contain chapters addressing chatbots, speech recognition, dialog systems, and speech-to-text software fundamentals.
№4: The Handbook of Computational Linguistics and Natural Language Processing
The Handbook of Computational Linguistics and Natural Language Processing is one of my absolute favorite NLP books of all time. The reason behind that is the fact that this book goes beyond the high level of NLP algorithms and applications. It takes you a deep dive into the theory behind parsing and computational linguistics and the state of the field today.
I would say that this book is more academic than the books previously mentioned in this list, mainly because it explains in detail the theories behind all different aspects of the field, as well as their practical side.
This book is a great reference source for everything NLP and computational linguistics, from text passing to sentiment analysis, to neural networks and speech recognition.
№5: Foundations of Statistical Natural Language Processing
Another great book by Stanford University professors. Foundations of Statistical Natural Language Processing is a book used in one of the courses taught at Stanford University. In fact, this book is used as a reference book in many universities around the world.
NLP is one of many Data Science branches, and all data science branches heavily depend on maths and statistics. This book goes over all the maths and statistics principles you’ll need to understand and implement NLP algorithms fully.
It covers different topics, from the statistical inference (n-grams) to Markov chains and how it’s used in NLP, text clustering and categorization, and information retrieval. The official website also includes slides used to explain the different chapters of this book that’s available for free.
№6: Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax
Natural language processing is a field the combines two things, computations, and linguistics. When people start learning NLP, they often consider the technical aspect of the field, such as learning to code, or the maths and fundamentals of machine learning.
The last book on this list is all about linguistics. In any NLP application, the goal is often to extract word dependencies in a provided natural language sentence, which basically falls into understanding the relationship between different speech parts. As data scientists, we train computers to understand our languages, and so, we need to have a good understanding of their mechanisms ourselves.
This book presents solid information about human languages’ syntactic sentience structure, providing much useful information that can be used to build and train sophisticated models that can perform better and more accurate NLP tasks.
Takeaways
If you tried Googling "NLP books" you’ll end up with hundreds – probably thousands – of results. Although, reading is the best way to gain information, reading hundreds of books to learn something may not be realistic. Especially if you want to switch careers, earn a certificate, pass a class, or just learn a new skill.
So, how do you decide which books to read first?
I have read many books about all aspects of data science, some I read to mentor others, some I read for research, and some just because I love NLP. Regardless of the reason, in this article, I presented you with 6 NLP books that I felt helped build my knowledge base the most.
These 6 books are just my opinion, they are also a starting point to learning NLP. Learning data science is an ongoing process, as long as the Technology advances, new algorithms, new techniques, and new languages will come up. As data scientists, we have to keep up to date with all new technologies.
I know that may sound too much, but being a data scientist is just like anything in life, it’s a journey, you learn new things with every step, all you have to do is go ahead and take the first step.