Is Hands-On Knowledge More Important than Theory?
We see these debates ebb and flow almost every week on TDS: should data scientists first master high-level concepts and get fluent in, say, probability theory—or dig right into the (occasionally) messy world of model tuning and data cleaning? In truth, the best posts we read and share blend these two sides of data science seamlessly. This week’s lineup is no exception; we focused on the practical and the hands-on a bit more than we usually do, but don’t worry: our authors always see the forest and the trees. Let’s get to it!
- Learn how to work around blind spots in autoregressive models. Jessica Dafflon, Walter Hugo Lopez Pinaya, and Pedro Ferreira da Costa follow up on their previous work on DeepMind’s PixelCNN, “a deep neural network that captures the distribution of dependencies between pixels in its parameters.” In their latest post, they tackle one of the model’s biggest limitations—the “blind-spot problem”—and show how you can fix it.
- Get acquainted (or reacquainted) with SMOTE. In his most recent explainer, Joos Korstanje introduces readers to SMOTE (Synthetic Minority Oversampling Technique), a machine learning approach that smooths out issues that pop up as a result of imbalanced datasets. Joos starts with the big picture and then dives into the nitty-gritty, sharing a full example in Python.
- Explore real-world use cases for hidden Markov models. “There are many tools for analyzing sequential data,” says Field Cady, “and they each have their strengths.” In this walkthrough focused on hidden Markov models, Field explains how they work and in what scenarios it would make the most sense to use them instead of, say, an LSTM model.
- Add pops of color to your Pandas DataFrames. Styling your DataFrames is about much more than aesthetics—as Zolzaya Luvsandorj shows in her handy tutorial, adding colors, gradients, and other visual cues makes them more engaging and easier to analyze and remember.
- Master a powerful variance-reduction method (or several). Stratification, CUPED, variance-weighted estimators… which one should you use to ensure your A/B test or online experiment has high statistical power? Sophia Yang guides us through some of the most effective approaches, and shares enough code examples to keep you busy tinkering for a while.
- Catch up on some of our most popular tips-and-tricks articles. Looking for even more actionable advice on tools, approaches, and learning strategies? Don’t miss our recent roundup of popular posts by the likes of Rashida Nasrin Sucky, Sharan Kumar Ravindran, Sara A. Metwalli, and Aliaksei Mikhailiuk (among others).
We hope you learned something new and exciting this week, whether on TDS or in a totally different area in your life. Thank you for reading our posts; if you’re ever looking for a hands-on way to support our work, consider becoming a Medium member.
Until the next Variable,
TDS Editors
P.S. Stay tuned — Jeremie Harris and the TDS Podcast are returning soon with a new season and an exciting lineup of new guests.
Recent additions to our curated topics:
Getting Started
- Make Beautiful 3D Plots in R—An Enhancement to the Storytelling by Xichu Zhang
- Explore and Understand Your Data with a Network of Significant Associations by Erdogan Taskesen
- Why Does Markov Decision Process Matter in Reinforcement Learning? by Mariko Sawada
Hands-On Tutorials
- Particle Filter Localization with Webots and ROS2 by Debby Nirwan
- How to Tune HDBSCAN by Charles Frenzel
- Exploring Use Cases of Machine Learning in the Geosciences by Martin Palkovic
Deep Dives
- Finding Families in the Wild by Joseph Robinson, PhD
- Simulating Traffic Flow in Python by Bilal Himite
- AI-Tunes: Creating New Songs with Artificial Intelligence by Robert A. Gonsalves