MONTHLY EDITION

Computers are great at working with structured data like spreadsheets and database tables. But as humans mostly communicate using language and words, that’s unfortunate for computers. A lot of information in the world is unstructured – for example, raw text in English or another language. How can we get a computer to understand unstructured text and extract information from it?
Natural Language Processing (NLP) is the sub-field of AI that is focused on enabling computers to understand and process human languages. If you’re new to data science, you’ll see that there’s an abundance of material out there covering all kinds of NLP-related tasks. The most common NLP blog posts that I’ve seen are related to sentiment analysis. That is, detecting whether a piece of text expresses a positive or negative sentiment. But there are many more NLP problems that exist.
I’d like to draw your attention to topic modelling, a field within NLP that I’ve recently started taking a serious interest in. Topic modelling identifies latent patterns of word occurrences using the distribution of words in a collection of documents. The output is a set of topics consisting of clusters of words that co-occur in these documents according to certain patterns.
Why do I think topic modelling is interesting? Because these days more than ever, it’s not only about how we feel, but it’s also about what is being said. In combination, sentiment analysis and topic modelling can be used to perform what’s called aspect-based sentiment analysis, where the goal is to extract both the entity described in the text and the sentiment expressed towards such entities.
For businesses, the advantages gained from exploring how customers are reacting towards particular parts of your service or product can help support business use cases, including product development and quality control, communications, customer support, and decision-making processes. This is much more information than just knowing whether your customers are happy or unhappy, and it can help support the constant development and improvement in your business.
Lowri Williams, Editorial Associate at Towards Data Science
Interactive Topic Modeling with BERTopic
An in-depth guide to topic modeling with BERTopic
By Maarten Grootendorst – 7 min read
Topic Modeling Articles with NMF
Extracting topics is a good unsupervised data-mining technique to discover the underlying relationships between texts.
By Rob Salgado – 12 min read
Topic Modeling Tutorial with Latent Dirichlet Allocation (LDA)
A practical guide with proven hands-on Python code. Find what people are tweeting about.
By Michel Kana, Ph.D – 5 min read
Introduction to Topic Modeling using Scikit-Learn
Explore 3 unsupervised techniques to extract important topics from documents
By Ng Wai Foong – 10 min read
Understanding NLP and Topic Modeling Part 1
We Explore How Extracting Topics Via NLP Helps Us Data Science Better
By Tony Yiu – 8 min read
Topic Modeling in Power BI using PyCaret
In this post, we will see how we can implement topic modeling in Power BI using PyCaret.
By Moez Ali – 7 min read
Topic Modelling: Going Beyond Token Outputs
An investigation into how to assign topics with meaningful titles
By Lowri Williams – 9 min read
Topic modelling with PLSA
PLSA or Probabilistic Latent Semantic Analysis is a technique used to model information under a probabilistic framework.
By Dhruvil Karani – 5 min read
Sentiment Analysis: Aspect-Based Opinion Mining
An investigation into sentiment analysis and topic modelling techniques.
By Lowri Williams – 8 min read
Topic Modelling in Python with NLTK and Gensim
In this post, we will learn how to identify which topic is discussed in a document, called topic modelling.
By Susan Li – 6 min read
New podcasts
- David Roodman – Economic history and the road to the singularity
- Ethan Perez – Making AI safe through debate
- Georg Northoff – Consciousness and AI
- Stuart Armstrong – AI: Humanity’s Endgame?
We also thank all the great new writers who joined us recently: Vivienne DiFrancesco, Monica Indrawan, Ouaguenouni Mohamed, Layne Sadler, Kendric Ng, Soroush Safaei, Alexandra Souly, Gant Laborde, abhi saini, Eden Molina, Wojtek Pyrak, Bora Tunca, Sam Ansari, Mahmoud Harmouch, Ajay Arunachalam, Maxim Ziatdinov, Sajjad Shumaly, Juan Samuel, Serhii Pospielov, Fernando Carrillo, Yann Morize, Sebastian Carino, Peng Yan, Paul Brunzema, Anders Borges, Ben Bogart, Xiao-Yang Liu, Alex Wagner, Michele Cavazza, Dimitris Dais, Julian Hatzky, Evans Doe Ocansey, Prajwalan Karanjit, Iqbal Ali, Stefan Hrouda-Rasmussen, Mike Casale, Maham Faisal Khan, Zainul Arifin, Silja Voolma, Ph.D., Will Nobles, Ben Santos, Mai Stafford, and many others. We invite you to take a look at their profiles and check out their work.