MONTHLY EDITION

March Edition: Warm and fuzzy data science

Data Science need not be as cold and distant as the math makes it seem

TDS Editors

Published in

Towards Data Science

4 min readMar 1, 2021

As you learn about and practice data science, it becomes harder and harder to avoid abstraction — often in the form of an algorithm coded in a particular programming language, or the mathematical characterization of an idea. What I’d like to highlight here is that data science and its component disciplines (statistics, machine learning, etc.), have origins very closely tied to our own way of perceiving and thinking about the world. I’d like to invite you to relate every data science principle or big idea you come across to your life: What are the parallels between a given learning algorithm and how you yourself learn? Whether it’s supervised or unsupervised, or even reinforcement learning, these paradigms and their underlying techniques need not be as abstract as we may characterize them.

One way to do this is to seek opportunities to talk about data science concepts in a way very much relatable to a wide audience. In my experience as a data educator, I often find the need and desire to think of simple scenarios that perfectly describe a potentially daunting idea in data science.

For example: I often characterize the concept of generalization in the context of tennis (or any other 1–1 sport). Say an individual chooses to train at the game of tennis at one academy, exclusively with the same coach, without ever playing with anybody else for several years. After a while, this individual starts beating their coach consistently and decides to crown themselves the best tennis player on Earth. How realistic is this title? Can we dare say this individual represents overfitting a data problem? After all, after playing their coach so much, they’re bound to memorize their patterns and exploit them. What if we were then to have this individual play against a player from another academy? How likely is it that our player will beat their rival?

Even if you don’t practice a sport yourself, I’m sure you can certainly weigh in the merits of playing against a diverse set of opponents. You may train on repetitive motions and techniques, but the ability to strategize, react, and perform in the face of pressure is best sparked by facing a diverse set of opponents continuously as you test your abilities. This is what I characterize as good generalization in learning.

Go ahead and challenge yourself to explain abstract concepts in simple, yet meaningful and correct ways. Try to explain them to a friend or colleague and gauge their interest and agreement. Take this feedback yourself and strive to become a better data science communicator. After all, clear and engaging communication is one of the most overlooked skills of the modern data scientist!

Can you think of other concepts worth communicating more accessibly? Exploration vs exploitation? Bias-Variance tradeoff? Independence in probability? Regularization? Ensembles? Gradient Descent? Share your ideas down below!

Sources that might interest you:
- https://arxiv.org/pdf/1702.07800.pdf
- https://www.frontiersin.org/articles/10.3389/fevo.2020.00082/full
- https://link.springer.com/article/10.1007/BF02478259
- https://archive.org/details/in.ernet.dli.2015.226341

Sergio E. Betancourt, Editorial Associate at Towards Data Science