The Variable
What Problem Is Your Data Solving?
Our weekly selection of must-read Editors’ Picks and original features
The evergreen popularity of careers in data science is the result of many factors, from shifts in the labor market to advances in cloud computing. It also hinges, though, on a fundamental idea: smart and passionate people look for work that feels meaningful. And meaningful work, by definition, answers important questions and solves real-world problems.
Bias in hiring is one such problem, and Grégoire Martinon was interested in examining AI’s role in perpetuating it—and, hopefully, its potential to end it. The result is a thought-provoking article that shows just how tricky it is to isolate the causes of bias, let alone render them powerless. Grégoire’s deep dive is absolutely worth your time.
There are many other challenges data scientists tackle every day, and they range from the theoretical all the way to the physical. What they have in common is the need for creativity and open-mindedness. Our recent conversation with machine learning engineer Mark Saroufim is a case in point: Mark’s career spans disciplines and ever-changing interests, but its common thread is the boundless curiosity he applies to challenges.
Sometimes the issues data scientists face are organizational: how do you build an analytics stack and hire a data team from near-scratch? Veronica M. Zhai has gone through that process at both a financial-services giant and a fast-growing startup, and she shares five key suggestions for scaling data operations based on these experiences.
Having a seasoned data team in place doesn’t always guarantee success. Many things can go wrong when skills, expectations, and workflows are out of sync. Cassie Kozyrkov’s latest post addresses precisely these situations. She discusses some of the most common symptoms that suggest your AI project is doomed—and offers ideas for preemptively addressing them.
For Elena Stamatelou, a major obstacle in product development is that even when huge amounts of data are available, “if we start immediately by looking at the data, we will probably get lost while trying to understand the data columns and fields and forget about the initial problem.” Read her introduction to design thinking to see how that approach informs her data science work.
Taking one giant step back, Gadi Singer reflects on the current limitations of AI and how all the progress we’ve seen in this field might just be a prelude to deeper knowledge—a stage where AI could go beyond the surface patterns of language and into deeper abstractions and representations. Advances in AI also come with major risks, though, and none might be more critical than those in the emerging industry of automated weapons. Jeremie Harris and Jakob Foerster discussed the ethical and political stakes involved in these AI-powered systems in a recent episode of the TDS Podcast.
As readers of TDS, you probably know that we have a soft spot for hands-on problem-solving, and are extremely proud of the deep archive of tutorials our contributors have published with us over the years. Let’s round out this week’s Variable with some of the best recent additions:
- If you’ve heard about Transformers but weren’t sure what they are, what they do, and why you should care, Dale Markowitz has written the clear, accessible explainer you needed.
- Once you’ve read Dale’s post, a great follow-up would be Davide Coccomini’s walkthrough of DINO, a recently unveiled approach to training Vision Transformers, and “one of the most interesting advances in the field of computer vision.”
- Ondřej Cífka shared a comprehensive guide to NoPdb, the non-interactive, programmatic Python debugger he created with an eye towards machine-learning models.
- Going back to basics, Matt Sosna wrote a thorough and accessible post on statistics, laying out “a set of foundational skills that will get you started for your role, no matter where you go.”
What problems are you solving with data science? What are the challenges you’re most passionate about? If the posts we shared this week inspire you to reflect on your own work, consider writing about them for TDS. (If you’re just getting started, our free, two-week email guide might give you the gentle nudge you need.)
Thank you for joining us this week and for supporting our far-flung community.
Until the next Variable,
TDS Editors
Recent additions to our curated topics:
Getting Started
- Individual Recourse for Black Box Models by Patrick Altmeyer
- What’s Explainable AI? by Omer Mahmood
- Simple Physics Animations Using VPython by Zhiheng Jiang
Hands-On Tutorials
- Similarity Encoding for Dirty Categories Using dirty_cat by Khuyen Tran
- How to Render a 3D Mesh and Convert It to a 2D Image Using PyTorch3D by Adele Kuzmiakova
- Image Captions with Attention in TensorFlow, Step by Step by Ketan Doshi
Deep Dives
- Who’s Who and What’s What: Advances in Biomedical Named-Entity Recognition (BioNER) by Sybren Jansen
- Enhancing Autoencoders with Memory Modules for Anomaly Detection by Varun Menon
- A Comparison of Synthetic vs. Human Labeled Dataset to Train a UNet Segmentation Model by Aaron Soellinger