The world’s leading publication for data science, AI, and ML professionals.

You’re Likely Learning Data Science Wrong, Here’s How to Learn Instead

Breaking apart commonly held fallacies in this field to design a new set of principles

Getting Started

Sunset on Lake Michigan. Image by author
Sunset on Lake Michigan. Image by author

Most young, aspiring data scientists I’ve talked to are really enticed by the accolades, just like I was. I’ve spent a lot of time researching the perfect courses, programs, and achievements so that I can have a suite of badges that look impressive to others.

The problem? You impress the wrong people because the ones who are highly skilled just don’t care about that stuff.

Chasing accolades and badges will lead you to completing those month-long Coursera courses within 1 month so you can share it, while feeling no pressure to actually apply those skills. It makes you feel special for reading 52 books in a year and then telling others you did that instead of really mastering one or two books in that same year. This path will not only lead you to falling short of achieving the results you are desiring, it will also lead you to creating your own Imposter Syndrome.

Let me propose a better route for achieving your dreams.

Introducing a better way of learning Data Science can be broken out by a number of fallacies that I’ve found people currently hold. This can be due to improper messaging about the craft/skills and too much hype giving people a misguided idea of what their journey should be.

Fallacy #1: Needing to know everything.

Anyone who has googled the term Data Science, Machine Learning, AI, or anything tangential knows how many skills are involved in this craft. Stats, Linear Algebra, Calculus, Python, R, Spark, SQL, Cloud, Domain Knowledge, etc. It’s incredibly overwhelming for everyone involved.

Due to a misrepresentation of the craft and sometimes a misunderstanding of the needs, we also get job roles that speak similarly which adds to this feeling that we need to know and be everything. We need to have 10+ years of Tensorflow experience (lol) along with a PhD in Stats along with a Domain Expertise in Financial Markets along with Superhuman abilities.

The craft is evolving fast enough that these job descriptions are becoming better, but the feeling remains that practitioners need to know and be everything. This fallacy leads people to take every course and read every book and do every kaggle project they can humanly do to boost their resume.

Here’s the problem with this strategy: this only impresses other juniors.

Seniors and up, the people who actually make decisions, and the people whom you actually want to impress care about results, value, and what you can deliver. They don’t really value your Coursera achievements. Let’s be clear though: everyone starts somewhere and I’m not downplaying MOOCs at all. In fact, I wrote a whole story on how you can get started with low-cost MOOCs so I really do support them in lowering the barrier to entry for many people. I’m just saying to not delude yourself into thinking this will gain you more than a personal sense of achievement, which is arguably still very valuable.

Instead of falling for this trap and feeling completely burnt out and unsuccessfully landing what you want, here’s what I’d recommend instead: spend 6 months studying one or two topics and then another 6 months applying what you learned.

For the first 6 months, I’d use the Feynman Technique. Do the Coursera courses or books or whatever you can grab, and follow these steps to learn it:

  1. Choose a concept you want to learn about. If it doesn’t spark your curiosity, find something about the topic’s application or a future evolution of the topic that can generate that interest. If still nothing, you’re probably trying to learn the wrong thing. Interest generated from curiosity is a crucial starting point.
  2. Explain it to a 12 year old. If you can’t explain it simply, you won’t succeed in any interview. People aren’t looking for mathematical formula recitations, they’re looking for deep understanding.
  3. Reflect, Refine, and Simplify. Check your assumptions twice and keep improving on your ability to think critically about what you’re learning. Don’t take what you’re reading at face value and try to break it open to figure out why things are working how they are.
  4. Organize and Review. Organize your thoughts on paper. Write things down and test them out in the real world by challenging people. Hear their debates and counter-points and probing questions and see how you navigate those. If you can’t, you may not understand it as well as you thought.

It may seem overkill to learn only one topic for 6 months, but if you do this route right (especially step 3) then you’ll find yourself broadening your horizons quite a bit to really learn a topic. But you won’t be focused on the wrong things or just completing arbitrary courses, you’ll be focused on the gold which is true understanding.

The next 6 months are just as crucial as the first. You have to build and you have to build something where the blueprints are not handed to you. If you just want to know about Data Science then you probably don’t need this, but if you really want to be a practitioner (and a good one) then you have to foray into unknown territory and show that you can hold your own.

People make a huge blunder here of going to kaggle and doing the obvious projects that everyone’s already done a billion times. This isn’t valuable to your learning or a good use of your time and this isn’t valuable to the people you should be trying to impress.

Instead, start with a question about an industry or a domain that you have an interest in and then look up open APIs to data that are out there related to it. Spend a substantial amount of time collecting some of that data and cleaning/manipulating it. It’s likely to be messy and you’re likely to draw from completely disparate sources. This sounds difficult but this is SO much of the craft in practice, so it will dramatically set you apart from your competition.

And then spend the rest of the time building out a solution that uses the skills you learned in the first 6 months. Bonus points if you can write about it or create content that shares what you learned or what you failed at. By the way, doing it this way will also mean you’re forced to learn much more than that one skill in these 6 months too.

This is really the only meaningful way to learn this craft and the best people know this. The reason is because very few people actually do this, but the ones that do become incredibly talented over time. Even if they don’t completely succeed in building the perfect project at the end, the lessons learned along the way and the story they’re able to tell is far more valuable.

Fallacy #2: Over-Emphasizing Math, Under-Emphasizing People.

I think this is due to a misrepresentation of the craft and what’s really needed to be a successful Data Scientist/Machine Learning Engineer. Although mathematical and statistical theory is really valuable, you usually don’t need a team fully stocked with everyone having this skill. In fact, if you were building out a team you want more people who are engineering-savvy than math-savvy. This is typically why people find it hard to land roles just by knowing mathematical theory and not having many projects under their belt.

You’re definitely a game changer if you know the mathematical theory but only if you know it in the context of programming. If you can’t code or exhibit your ability to code then that’s 100% where you need to start.

A follow-up issue of this fallacy though is that the two worlds of programming and math get debated all the time but the communication and people skills get written off as "soft." It’s actually really detrimental to future prospects to do this because influencing people and communication is such a big part of the craft and it is incredibly hard. In fact, it’s so hard that I feel confident in saying that most people who think they have this skill really, really don’t.

This ability includes, but isn’t limited to:

  1. Being able to truly listen to the needs of your customer or business partner and ask specific questions that bring relevant clarity to a problem.
  2. Propose a solution that directly solves the problems they have, can understand, and are able to maintain.
  3. Build a trusting relationship with your customer in a short amount of time so that they trust you with solving their problem and provide adequate support.
  4. Be able to influence the right leaders to pivot or try different ideas that you believe will work but weren’t originally proposed. Being able to sell your ideas is done almost every day in this function.
  5. Storytelling. This is 100% not a skill everyone has and it’s not just data visualization. Storytelling is the art of communicating a complex idea in a clear and concise way so people are meaningfully influenced by it. Typically, in business settings, this means a change in some action or policy in response to the story.

None of the above and more work if you don’t have people skills. My job is dealing with people all day long, so please make sure you’re developing these as much as you’re learning the technical depths of the craft. The technical depths may be able to get your foot in the door, but you’ll hit a career wall incredibly fast if you can’t influence people.

Fallacy #3: Optimizing for the Destination and not the Journey.

Data Science, Machine Learning, and AI are not crafts that are final points. They are crafts that require you to keep learning and reinventing yourself. Not necessarily to stay relevant, but more to keep elevating how you’re delivering value (for yourself and for others).

You have to be someone who cherishes enriching your mind to keep growing and improving, otherwise this craft can be really miserable for you. It’s a craft built for those who are doing more than they’re told mainly due to their passion for learning and building.

Due to this, you have to love the journey because it’s an infinite game. There is not a realistic end point in this craft and if you think there is then you’re playing the wrong game. Even for a single project, there often are countless ways you can improve an initial idea or model at the point of delivery but you have to know when to walk away and be happy with what you’ve done. The field never ends evolving because it has too many vast depths to traverse.

If you see the most valuable thing as you holding the "Data Scientist" title then it never ends (Senior, Manager, VP, etc.) and you’re focusing on things that won’t actually provide any meaning anyways. Pay attention to what those with these skills can actually do and are actually doing. Pay attention to how they think and how they built those mental models. Their competitive edge is not having the titles and accolades, it’s their ability to critically think at an incredibly efficient pace to solve vastly complex business problems with technology.

This skill is only gained when you love putting in the time to keep getting better.

A New Set of Principles

Breaking apart commonly held fallacies begs the need for a new set of principles to guide your own journey. Whatever the reason for those fallacies (poor expectations, bad job postings, too much hype, etc.), it’s imperative you read about, talk to, and learn from actual practitioners in the field on meaningful ways to progress.

Principle #1: Learn with Momentum Instead of Speed

Focusing on speed will result in you arbitrarily finishing every single MOOC out there instead of focusing on just what you need to understand one piece of a concept you need for a project. Learning with momentum implies that you understand how to build the right set of foundational blocks to allow you to move faster in the future. This is the best way to take advantage of compound growth for yourself.

Principle #2: Learn with Others Instead of Solo

The best time to actively join a community is yesterday, and the second best time is today. Engage with people around you, ask them questions about what they specifically know and try to deeply understand how they think. Share your knowledge and failures openly so future practitioners can be better. Just as the tried & true method for alleviating risk is to diversify and spread out your risk, the tried & true method for compounding growth is to diversify and spread out your community.

Principle #3: Learn with Uncertainty Instead of Certainty

Infinite games are ones that don’t have a clear set of rules, start/end points, and arbitrary competitors. Infinite games are endlessly evolving and ripe with uncertainty, whereas finite games are the exact opposite. This craft is in a much faster motion than one can reasonably move at, but it is the future of how we do business and live life. It’s only going to keep getting faster from here so you have to be okay dancing in the uncertainty of the journey rather than obsess over the certainty of any arbitrary destination.

If you have any thoughts on what fallacies I may have missed or principles that these three don’t cover, please comment and engage! Would love to broaden my horizons and keep the discussion going.


Related Articles