Data Science

When Language Meets Data

Our weekly selection of must-read Editors' Picks and original features

Jun 1, 2022

3 min read

Photo by 4motions Werbeagentur on Unsplash

From chatbots to sentiment analysis, we’re seeing an explosion of real-world use cases for textual data. Some of the buzziest innovations in AI revolve around models trained with ever-increasing quantities of text; on the flip side, we can trace many of the challenges the field is facing to limited, unrepresentative, or flat-out biased language datasets.

This week, we share six recent posts that cover data and language through a wide range of topics and approaches—NLP fans will have a blast, but so will programmers, data engineers, and AI enthusiasts. Let’s dive in!

The wall all large language models run into (for now). GPT-3 and similar generative models can produce text that sounds truthful even when it lacks factuality. Iulia Turc explores the issue of these models’ groundedness – "the ability to ground their statements into reality, or at least attribute them to some external source"—and why it’s been so difficult to develop models that come close to human performance.
Natural language querying is making a splash. Up until recently, humans had to invent (and then learn) complex languages in order to communicate with computers and manipulate digital data. Andreas Martinson discusses the emerging world of NLQ—natural language querying—and how it might transform the work of data professionals for the better, as well as democratize access to databases.
Choosing the right tools to simplify complex NLP tasks. The difference between clunky and streamlined workflows can sometimes come down to seemingly trivial choices. Kat Li surveys five less-known Python libraries—from Pyspellchecker to Next Word Prediction—and explains how they can save time and effort when used in the right NLP context.

How to translate your text-derived findings into compelling visuals. If you’re in a tinkering mood, you’ll enjoy Petr Korab‘s latest tutorial. Going beyond the usual suspects—looking at you, word cloud!—this tutorial walks us through the creation of more advanced (and more polished) visualizations, including chord diagrams and packed bubble charts.
On algorithms and spelling. In a neat new project, Socret Lee built a Khmer spellchecker as part of a bigger keyboard app. Socret’s writeup patiently explains the process, and zooms in on two concepts that proved crucial for the implementation of the tool: BK-tree and Edit Distance.
When Sentiment analysis, pop culture, and social media collide. A new season of a popular Netflix show provides the perfect opportunity to analyze text (and emoji!) at scale. Amanda Iglesias Moreno‘s latest article leverages the Twitter API to study polarity in tweets about Regency Era-set Bridgerton.

There’s always more to explore on TDS, so we hope you still have some time and stamina for a handful of excellent reads on other topics; we just couldn’t not share these with you.

If you’ve been missing Carolina Bento‘s crystal-clear, expertly illustrated tutorials, you’re in luck: a new one just landed on TDS, and it explains RNNs (recurring neural networks) with a real-life example and ample code snippets.
Bird-loving data scientists will find Benedict Neo‘s new project both fun and interesting: it attempts to classify bird species based on genetic attributes and location.
Learn about a fast and efficient approximate nearest neighbor search by following along Peggy Chang‘s latest tutorial. It covers similarity search through a combination of an inverted file index (IVF) and product quantization (PQ).
To end on a dose of practical inspiration, Pau Labarta Bajo shared several hard-earned insights on boosting your ML skills so you can excel in real-world contexts, which are always messier and more complex than what you learned in courses or bootcamps.

Thank you, as always, for your passion and curiosity. To support the work we publish, consider sharing your favorite article on Twitter or LinkedIn, telling your Data Science colleagues about us, and/or becoming a Medium member.

Until the next Variable,

TDS Editors

Written By

TDS Editors

See all from TDS Editors

Topics:

Data Science, NLP, Tds Features, The Variable, Towards Data Science

Share this article:

Related Articles

Implementing Convolutional Neural Networks in TensorFlow
Artificial Intelligence

Step-by-step code guide to building a Convolutional Neural Network

Shreya Rao

August 20, 2024

6 min read
Hands-on Time Series Anomaly Detection using Autoencoders, with Python
Data Science

Here’s how to use Autoencoders to detect signals with anomalies in a few lines of…

Piero Paialunga

August 21, 2024

12 min read
Solving a Constrained Project Scheduling Problem with Quantum Annealing
Data Science

Solving the resource constrained project scheduling problem (RCPSP) with D-Wave’s hybrid constrained quadratic model (CQM)

Luis Fernando PÉREZ ARMAS, Ph.D.

August 20, 2024

28 min read
Back To Basics, Part Uno: Linear Regression and Cost Function
Data Science

An illustrated guide on essential machine learning concepts

Shreya Rao

February 3, 2023

6 min read
Must-Know in Statistics: The Bivariate Normal Projection Explained
Data Science

Derivation and practical examples of this powerful concept

Luigi Battistoni

August 14, 2024

7 min read
How to Make the Most of Your Experience as a TDS Author
Data Science

A quick guide to our resources and FAQ

TDS Editors

September 13, 2022

4 min read
Our Columns
Data Science

Columns on TDS are carefully curated collections of posts on a particular idea or category…

TDS Editors

November 14, 2020

4 min read
Optimizing Marketing Campaigns with Budgeted Multi-Armed Bandits
Data Science

With demos, our new solution, and a video

Vadim Arzamasov

August 16, 2024

10 min read
Back to Basics, Part Tres: Logistic Regression
Data Science

An illustrated guide to everything you need to know about Logistic Regression

Shreya Rao

March 2, 2023

8 min read

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.