Data Science Unicorns, RAG Pipelines, a New Coefficient of Correlation, and Other April Must-Reads

TDS Editors
Towards Data Science
4 min readMay 2, 2024

--

Feeling inspired to write your first TDS post? We’re always open to contributions from new authors.

Some months, our community appears to be drawn to a very tight cluster of topics: a new model or tool pops up, and everyone’s attention zooms in on the latest, buzziest news. Other times, readers seem to be moving in dozens of different directions, diving into a wide spectrum of workflows and themes. Last month definitely belongs to the latter camp, and as we looked at the articles that resonated the most with our audience, we were struck (and impressed!) by their diversity of perspectives and focal points.

We hope you enjoy this selection of some of our most-read, -shared, and -discussed posts from April, which include a couple of this year’s most popular articles to date, and several top-notch (and beginner-friendly) explainers.

Monthly Highlights

  • The Math Behind Neural Networks
    By now, few of you need an introduction to Cristian Leo’s series of guides to the essential concepts of machine learning. Perhaps none of these building blocks are more essential than neural networks, of course, so it comes as no surprise that this deep dive into their underlying math became such a success among our readers.
  • Pandas: From Messy To Beautiful
    It’s always a joy to see an author’s first TDS article strike a chord with a wide audience; this is precisely what happened with Anna Zawadzka’s practical guide to improving your Pandas code, providing actionable tips for keeping it “clean and infallible.”
  • A New Coefficient of Correlation
    True breakthroughs in statistics don’t arrive very often these days—which explains why Tim Sumner’s article on a recent paper, which introduced a “new way to measure the relationship between two variables just like correlation except possibly better,” generated a massive response from data professionals.
Photo by micheile henderson on Unsplash
  • How to Build a Local Open-Source LLM Chatbot With RAG
    Several months after making their initial splash in ML circles, RAG approaches seem to have lost none of their shine. Dr. Leon Eversberg’s tutorial is a case in point: it adds a novel solution to a growing list of tools that allow us to “talk” to our PDF documents.
  • Deep Dive into Transformers by Hand
    Transformers guides and technical walkthroughs aren’t exactly hard to find. What sets Srijanie Dey, PhD’s contribution apart is its accessibility and clarity —which, along with its well-executed illustrations, made it a particularly strong resource for beginners and visual learners.
  • From Data Scientist to ML / AI Product Manager
    Making a career transition is never a trivial endeavor, and even less so during a difficult period for job seekers. Anna Via offered a generous dose of inspiration, along with more than a few actionable tips and insights, based on her own successful role switch to become a machine learning product manager.
  • The 4 Hats of a Full-Stack Data Scientist
    What does it take to become a genuine “full-stack” data professional? Shaw Talebi recently launched a series exploring (and answering) this question in detail; this post, the first in the sequence, provides a high-level perspective into the core skills of a data scientist who can “see the big picture and dive into specific aspects of a project as needed.”
  • Meet the NiceGUI: Your Soon-to-be Favorite Python UI Library
    It’s tough to keep track of all the exciting new libraries, packages, and platforms announced every day—which is why a detailed, opinionated, firsthand review can be so useful. That’s precisely what Youness Mansar sets out to accomplish with his intro to NiceGUI, an open-source Python-based UI framework.
  • Linear Regressions for Causal Conclusions
    More often than not, keeping things simple is the key to success. That’s a point that Mariya Mansurova drives home again and again in her guide to drawing causal conclusions in the context of product analytics, which avoids fancy algorithms and complex equations in favor of tried-and-true linear regressions.

Our latest cohort of new authors

Every month, we’re thrilled to see a fresh group of authors join TDS, each sharing their own unique voice, knowledge, and experience with our community. If you’re looking for new writers to explore and follow, just browse the work of our latest additions, including Thomas Reid, Rechitasingh, Anna Zawadzka, Dr. Christoph Mittendorf, Daniel Manrique-Castano, Maxime Wolf, Mia Dwyer, Nadav Har-Tuv, Roger Noble and Martim Chaves, Oliver W. Johnson, Tim Sumner, Jonathan Yahav, Nicolas Lupi, Julian Yip, Nikola Milosevic (Data Warrior), Sara Nóbrega, Anand Majmudar, Wencong Yang, Shahzeb Naveed, Soyoung L, Kate Minogue, Sean Sheng, John Loewen, PhD, Lukasz Szubelak, Pasquale Antonante, Ph.D., Roshan Santhosh, Runzhong Wang, Leonardo Maldonado, Jiaqi Chen, Tobias Schnabel, Jess.Z, Lucas de Lima Nogueira, Merete Lutz, Eric Boernert, John Mayo-Smith, Hadrien Mariaccia, Gretel Tan, Sami Maameri, Ayoub El Outati, Samvardhan Vishnoi, Hans Christian Ekne, David Kyle, Daniel Pazmiño Vernaza, Vu Trinh, Mateus Trentz, Natasha Stewart, Frida Karvouni, Sunila Gollapudi, and Haocheng Bi, among others.

Thank you for supporting the work of our authors! We love publishing articles from new authors, so if you’ve recently written an interesting project walkthrough, tutorial, or theoretical reflection on any of our core topics, don’t hesitate to share it with us.

Until the next Variable,

TDS Team

--

--

Building a vibrant data science and machine learning community. Share your insights and projects with our global audience: bit.ly/write-for-tds