Making Data Useful

30 Data Science Punchlines

A holiday reading list condensed into 30 quotes

Cassie Kozyrkov

Published in

Towards Data Science

9 min readDec 20, 2018

For those who like brainfood on your vacation, here’s a handy index of all my articles from 2018 boiled down to 30 (occasionally cheeky) punchlines to help you avoid/cause awkward silences at family events and holiday parties.

Sections: Data Science and Analytics, ML/AI Concepts, How Not To Fail At ML/AI, Data Science Leadership, Technology, Statistics.

Bonus: Videos, podcasts, foreign language translations for your non-English-speaking friends and family to enjoy, and an end-to-end deep learning tutorial for the Pythonistas among you.

Data Science and Analytics

What on earth is data science? A quick tour of data science, data engineering, statistics, analytics, ML, and AI.

Data science is the discipline of making data useful.

What Great Data Analysts Do — and Why Every Organization Needs Them. Good analysts are a prerequisite for effectiveness in your data endeavors. It’s dangerous to have them quit on you, but that’s exactly what they’ll do if you under-appreciate them.

Each of the three data science disciplines has its own excellence. Statisticians bring rigor, ML engineers bring performance, and analysts bring speed.

Secret Paragraphs from HBR’s Analytics A collection of musings omitted from the article above. Let’s talk about hybrid roles, the nature of research, Bat Signals, data charlatans, and awesome analysts!

Buyer beware: there are many data charlatans out there posing as data scientists. There’s no magic that makes certainty out of uncertainty.

Top 10 roles in AI and data science. A guide to the job titles, in hiring order.

If a researcher is your first hire, you probably won’t have the right environment to make good use of them.

ML/AI Concepts

The simplest explanation of machine learning you’ll ever read. Machine learning is a thing-labeler where you explain your task with examples instead of instructions.

Machine learning is a new programming paradigm, a new way of communicating your wishes to a computer. It’s exciting because it allows you to automate the ineffable.

Are you using the term ‘AI’ incorrectly? With poorly defined terms, there’s not really such a thing as using them correctly. We can all be winners, but here’s a quick guide to the alphabet soup of AI, ML, DL, RL, and HLI.

If you’re worried that there’s a human-like intelligence lurking in every cupboard, breathe easy. All those industry AI applications are too busy solving real business problems.

Explaining supervised learning to a kid (or your boss). My goal here is to get humans of all stripes comfy with some basic terminology: instance, label, feature, model, algorithm, and supervised learning.

Don’t be intimidated by jargon. For example, a model is just a fancy word for “recipe.”

Machine learning — Is the emperor wearing clothes? A beginner-friendly look at the core concepts — including algorithms and loss functions — via pictures and cat memes.

Don’t hate machine learning for being simple. Levers are simple too, but they can move the world.

Neural networks may as well be called “yoga networks” — their special power is giving you a very flexible boundary.

Unsupervised learning demystified. Unsupervised learning helps you find inspiration in data by grouping similar things together for you. The results are a Rorschach card to help you dream.

Think of unsupervised learning as a mathematical version of making “birds of a feather flock together.”

Explainable AI won’t deliver. Here’s why. Many people are drawn to XAI because they think it’s a good basis for trust. It isn’t, and getting caught up in the trust hype might mean you’ll miss out on something XAI is great for: inspiration.

If you refuse to trust decision-making to something whose process you don’t understand, then you should fire all your human workers, because no one knows how the brain (with its hundred billion neurons!) makes decisions.

How Not To Fail At ML/AI

Why businesses fail at machine learning. Many businesses don’t realize that applied ML is a very different discipline from ML algorithms research.

Imagine trying to start a restaurant by hiring folks who’ve been building microwave parts their whole lives but have never cooked a thing… what could possibly go wrong?

Which of these are you selling? The right team to hire depends on your answer.

Advice for finding AI use cases. My brainstorming trick for finding opportunities to apply starts with imagining that AI is a hoax…

A common mistake businesses make is to assume machine learning is magic, so it’s okay to skip thinking about what it means to do the task well.

The first step in AI might surprise you. What’s the right way to start an AI project? Get an AI degree? No. Hire an AI wizard? Nope. Pick an awesome algorithm? Not that either. Dive into the data? Wrong again! Here’s how to do it better.

Never ask a team of PhDs to “Go sprinkle machine learning over the top of the business so… good things happen.”

Is your AI project a nonstarter? A (reality) checklist you should go through before you hire any engineers or get any data for an applied ML/AI project.

Don’t waste your time on AI for AI’s sake. Be motivated by what it will do for you, not by how sci-fi it sounds.

Getting started with AI? Start here! A detailed guide to the decision-maker’s role and responsibilities in an applied ML/AI project.

Just because you can do something, doesn’t mean it’s a good use of anyone’s time. We humans fall in love with what we have poured effort into… even if it is a pile of poisonous rubbish.

Whose fault is it when AI makes mistakes? The point of ML/AI is that you’re expressing your wishes using examples instead of instructions. For it to work, the examples have to be relevant.

If you use a tool where it hasn’t been verified safe, any mess you make is your fault. AI is a tool like any other.

Data Science Leadership

Data-Driven? Think again. For a decision to be data-driven, it has to be the data — as opposed to something else entirely — that drive it. Seems so straightforward, and yet it’s so rare in practice because decision-makers lack a key psychological habit.

The more ways there are to slice the data, the more your analysis is a breeding ground for confirmation bias. The antidote is setting your decision criteria in advance.

Is data science a bubble? Learn more about the people calling themselves “data scientists” and why the industry is playing a dangerous game.

“I think you might be hiring data scientists the way a drug lord buys a tiger for his backyard,” I told him. “You don’t know what you want with the tiger, but all the other drug lords have one.”

I don’t know any actual drug lords (or tigers), so I’m not sure what’s in those backyards. But you get my point.

Data Science Leaders: There are too many of you. What’s the plan for training decision-makers with the skills to make data science teams successful? Hope is not a strategy!

…a pro-math subculture where it’s fashionable to display disdain for anything that smells like “soft” skills. It’s all chest-thumping about how hardcore you are for staying up all night proving some theorem or coding in your sixth language.

Rethinking Fast and Slow in Data Science. Is it possible for product development teams to reconcile rapid iteration with the slow-moving behemoth of the deep research process, or must they pick one?

Inspiration is cheap, but rigor is expensive.

Interview: Advice for data scientists. Candid answers to a fellow data scientist’s questions. Topics include: favorite resources, careers, statistics education, and data science leadership.

Useful is worth more than complicated. Data quality is worth more than method quality. Communication skills are worth more than yet another programming language.

Technology

9 Things You Should Know About TensorFlow. TensorFlow might be your new best friend if you have a lot of data and/or you’re after the state-of-the-art in AI. It’s not a data science Swiss Army Knife, it’s the industrial lathe. Here’s what’s new with it.

With TensorFlow Hub, you can engage in a more efficient version of the time-honored tradition of helping yourself to someone else’s code and calling it your own (otherwise known as professional software engineering).

5 Bite-Sized Data Science Summaries. 5 favorite talks from Google Cloud Next SF 2018. 5 video summaries. 5 minutes or less.

AI spent over half a century being more hype than happening. So, why now? Many people don’t realize that the story of today’s applied AI is actually a story about The Cloud.

Statistics

Don’t waste your time on statistics. How to determine whether you need statistics and what to do if you don’t.

Statistics is the science of changing your mind.

Never start with a hypothesis. Starting with hypotheses instead of actions is a common mistake among those who learn the math without absorbing any of the philosophy. Let’s look at how to do use statistics for decision-making.

Hypotheses are like cockroaches. When you see one, it’s never just the one. There’s always more hiding somewhere nearby.

Statistics for people in a hurry. Ever wished someone would just tell you what the point of statistics is and what the jargon means in plain English? Let me try to grant that wish for you in 8 minutes!

The math is all about building a toy model of the null hypothesis universe. That’s how you get the p-value.

Populations — You’re doing it wrong. A statistical approach only makes sense when there’s a mismatch between the information you want (population) and the information you have (sample). What happens if the project’s leader doesn’t know what information they want?

In the Icarus-like leap from sample to population, expect a big splat if you don’t know where you’re aiming.

Statistics Savvy Self-Test. Will you pass this small quiz that checks your statistical expertise? You might not if you believed what they told you in STAT101…

If you had facts, you wouldn’t need statistics.

Incompetence, delegation, and population. If the decision-maker doesn’t have the right skills, your whole statistical project is doomed. When is it appropriate for the statistician to make a fuss and when should they meekly follow orders?

If your goal is to persuade people using data, you may as well throw rigor out the window (since that’s where it belongs) and make pretty graphs instead.

Translations

My second Medium account hosts community-translated articles in other languages. Here are examples in Arabic, Chinese, Dutch, French, German, Hindi, Indonesian, Italian, Japanese, Portuguese (BR), Russian, Spanish, and Turkish. (Read this if you’d like to volunteer a translation.)

Podcasts

DI Podcast: I read my articles for those who prefer audio

30min GCP Podcast about Decision Intelligence

65min DataCamp podcast about making data science useful

Videos

29min talk about ethics and responsibility in AI

19min talk introducing Decision Intelligence

15min talk explaining Google Cloud’s ML offerings in terms of making pizza. If you’ve already watched one of the videos above, start at 5:25.

Thanks for reading! How about a YouTube course?

Wow! You got all the way to the end? Stamina challenge aced! You’re ready for the speed challenge… how many times can you clap in 5 seconds? :)

If you had fun here and you’re looking for an applied AI course designed to be fun for beginners and experts alike, here’s one I made for your amusement:

Enjoy the entire course playlist here: bit.ly/machinefriend

Liked the author? Connect with Cassie Kozyrkov

Let’s be friends! You can find me on Twitter, YouTube, Substack, and LinkedIn. Interested in having me speak at your event? Use this form to get in touch.