The world’s leading publication for data science, AI, and ML professionals.

June Edition: Learning from Industry

Moving beyond tutorials, courses, and side projects

MONTHLY EDITION

Photo by olia danilevich from Pexels
Photo by olia danilevich from Pexels

Courses, textbooks, blog posts, and personal projects are great ways to learn to do Data Science. But it’s also widely acknowledged that these aren’t enough to actually start working in a data science role. Vicki Boykis, a senior machine learning engineer at Automattic, identified this as a problem of implicit versus explicit knowledge. Explicit knowledge is what we have easy access to, as it’s written down somewhere for us to learn. Implicit knowledge is what we call learning on the job–it resists being packaged into a textbook or article. Implicit knowledge can be a barrier for newcomers to data science who don’t have access to on-the-job training or professional mentorship.

At TDS, we try to bridge this gap for our readers by identifying articles that codify implicit knowledge in our field. We created the column Notes from Industry to curate and highlight the best pieces on data science challenges, applications, and solutions encountered in the real world.

In the following posts, students can get a glimpse of what real-world data challenges look like. Junior data scientists will pick up best practices from practitioners. And senior data scientists may find some interesting gems that cover what their peers are implementing to solve commonly encountered problems.

Elliot Gunn, Editor/Analyst at Towards Data Science


PR Reviews for SQL code

No guide existed on how to review SQL code for a data science use case. So the author created a checklist for peer review.

By Marc-Olivier Arsenault – 10 min read


Data Cleaning IS Analysis, Not Grunt Work

Most articles on data cleaning lack depth. The article looks at what data cleaning actually is and how it’s a key form of data analysis that should be termed "building reusable transformations."

By Randy Au – 19 min read


Lessons on ML Platforms – from Netflix, DoorDash, Spotify, and more

Dives into the ML platform components and tools used by large tech companies.

By Ernest Chan – 12 min read


Analytics Lifecycle Management

A data science manager looks at the full analytics lifecycle, from problem formulation to sunsetting the solution.

By Ying Li – 7 min read


What you can learn from one extra experiment

Investigates why only doing A/B testing can be insufficient and looks at alternative experimentation approaches that yield more insights.

By Kevin Dunn – 11 min read


Data as Code – Achieving Zero Production Defects for Analytics Datasets

Applying tools and practices from high performing software engineering teams for data workflows.

By Sven Balnojan – 9 min read


Strategists: Stop Obsessing about Averages

Analysts need to go beyond means, medians, and modes. Outliers provide new information about the future.

By Roger Martin – 7 min read


The 7 Tasks in Data Science Management

This article looks at the complexity of data science team management.

By Martin Schmidt and Marcel Hebing – 13 min read


How 2 Build a Cloud-Based ML Ops Framework in 2 weeks

A team of three shares how they built a complete ML Ops framework in two weeks using DevOps best practices.

By Lars Kjeldgaard – 6 min read


Advanced forecasting using Bayesian diffusion modeling

This article presents an advanced modelling approach, Bayesian diffusion modelling, through an end-to-end case study.

By Fraser Lewis – 11 min read


New podcasts


We also thank all the great new writers who joined us recently Stephan Tulkens, Xiaoying Wang, Yann-Aël Le Borgne, Mike Bostock, Vaibhav Nandwani, Ying Li, Yasas Sandeepa, Nitish Kumar Thakur, Seth Billiau, Farhad Dalirani, Chanade Hemming, Wilson Wang, Jack Baker, Roberto Martorelli, Klemen Kotar, Debanjana Chakraborty, Benjamin Griffiths, Joi Schünemann, Srikanth Machiraju, John Alling, Vasnetsov Andrey, Suhong Kim, Anna Jacobson, Jesse Ruiz (she/they), Amizorach, Luis Felipe de Souza Rodrigues, Michele Riva, Charles Frenzel, Nur Shlapobersky, Grégoire Martinon, John Pette, Pavel Fokin, Dennis Eilers, Ong Chin Hwee, Luke Griswold, Talia Reich, Ondřej Cífka, Amin Ahmad, Nikhil S Hubballi, Juri Sarbach, Tyler Kim, Alex Thewsey, Steve Golik, Martin Schmidt, Sid Arcidiacono, Varun Menon, David Dale, Sybren Jansen, Joyce Annie George, Veronica M. Zhai, Thomas Olavson, Julia Signell, Lara da Rocha, Yalim Demirkesen, Baptiste Moreau, Patrick Altmeyer, Nikhil Rasiwasia, Zhiheng Jiang, Elliot Humphrey, Tameem Iftikhar and many others. We invite you to take a look at their profiles and check out their work.


Related Articles