At its core, data science is a discipline whose purpose is to help people—business owners, doctors, educators, public servants—make good real-world decisions. The assumption is that outcomes improve as our actions become more strongly informed by data. But is that always true? Can data lead us astray?
Vishesh Khemani explores this question in depth in his recent post on reaching the best possible decisions. Taking the example of his daughter’s new smartphone (and whether or not buying insurance for it makes sense), he explains the math behind our daily decision-making processes. It’s a great introduction to applied statistics and a handy guide for making rational decisions.
Our other highlights this week cover even more facets of decision-making. They range from choosing a specific algorithm for a given task to evaluating the performance of soccer players. Let’s dive in.
- Take a look under the hood of a recommender system. Many of you reading The Variable also receive Medium’s weekly digest of recommended posts. Sejal Dua wanted to gain a deeper understanding of how that list is composed and what it might say about her habits, so she reverse-engineered it using topic-modeling algorithm BERTopic.
- Learn about the tricky process behind evaluating skills. Coaches used to rely on memory and instinct to shape their team’s strategy; with the advent of sports analytics, this is no longer enough. Ofir Magdaci‘s latest foray into the world of football (or soccer, if you insist) shows just how complex the process is to evaluate a player’s potential and skills.
- Explore a powerful tool for visualizing big data. Sometimes, the difference between success and failure, good and bad outcomes, hinges on small details like the tools we choose for our work. Sophia Yang‘s latest contribution walks us through her Datashader workflow, and explains how to use it to create effective visualizations of massive datasets—fast.
- Choose the right metric to assess a model’s performance. "When viewed as an overall service," says Aparna Dhinakaran, "the ML application also has to be measured by its overall service performance." Diving into the details around service latency and inference latency, Aparna explains how to optimize models—and our monitoring of models—to produce the best possible results.
- Get a deeper understanding of the stakes around AI regulation. As conversations around AI safety and algorithmic bias have proliferated in recent years, so have those around the need for better governance and regulations. On the TDS podcast, Jeremie Harris recently chatted with Anthony Habayeb about the implications of this development for businesses, governments, and practitioners.
Thanks, as always, for reading, sharing, and supporting our authors’ work.
Until the next Variable, TDS Editors
Recent additions to our curated topics:
Getting Started
- The Eclat Algorithm by Joos Korstanje
- Creating a Data Science Portfolio by Maarten Grootendorst
- 7 Essential Python Skills for Research by Thomas Hikaru Clark
- These 17 Projects Will Teach You Python Way Better than "Hello World" by Zulie Rane
Hands-On Tutorials
- Speeding Up Data Analysis with TimescaleDB and PostgreSQL by Miranda Auhl
- What’s in a "Random Forest?" Predicting Diabetes by Raveena Jayadev
- Where to Place Wards in DOTA2 by Nadim Kawwa
- Close Encounters of the K-Means Kind by Will Crowley
Deep Dives
- Towards Removing Gender Bias in Writing by Srihitha Pallapothula
- The "Frequently Bought Together" Recommendation System by Ben Bogart
- Does Neighborhood Trapping Work? by Sam McClatchie
- Tune Your Machine Learning Workflow with Weights and Biases by Jean-Michel D
Thoughts and Theory
- Performing Deduplication with Record Linkage and Supervised Learning by Sue Lynn
- Word2vec with PyTorch: Implementing the Original Paper by Olga Chernytska
- A Journey towards Faster Reinforcement Learning by Yann Berthelot
- A Python Framework to Retrieve and Process Hyperspectral Field Measurements from TRIOS Sensors (MDSA format) by Maurício Cordeiro