The world’s leading publication for data science, AI, and ML professionals.

10 Most Practical Data Science Skills You Should Know in 2022

Skills that will actually make you employable

Photo by Carles Rabada on Unsplash
Photo by Carles Rabada on Unsplash

Introduction

Many "How to Data Science" courses and articles, including my own, tend to highlight fundamental skills like Statistics, Math, and Programming. Recently, however, I noticed through my own experiences that these fundamental skills can be hard to translate into practical skills that will make you employable.

Therefore, I wanted to create a unique list of practical skills that will make you employable.

The first four skills that I talk about are absolutely pivotal for any data scientist, regardless of what you specialize in. The following skills (5–10) are all important skills but will vary in usage depending on what you specialize in.

For example, if you’re most statistically grounded, you might spend more time on inferential statistics. Conversely, if you’re more interested in text analytics, you might spend more time learning NLP, or if you’re interested in decision science, you might focus on explanatory modeling. You get the point.

With that said, let’s dive into what I believe are the 10 most practical data science skills:

Be sure to subscribe here or to my personal newsletter to never miss another article on data science guides, tricks and tips, life lessons, and more!


1. Writing SQL Queries & Building Data Pipelines

Learning how to write robust SQL queries and scheduling them on a workflow management platform like Airflow will make you extremely desirable as a data scientist, hence why it’s point #1.

Why? There are many reasons:

  1. Flexibility: companies like data scientists who can do more than just model data. Companies LOVE full-stack data scientists. If you’re able to step in and help build core data pipelines, you’ll be able to improve the insights that are gathered, build stronger reports, and ultimately make everyone’s lives easier.
  2. Independence: there will be instances where you need a table or view for a model or a data science project that does not exist. Being able to write robust pipelines for your projects instead of relying on data analysts or data engineers will save you time and make you more valuable.

Therefore, you MUST be an expert at SQL as a data scientist. There are no exceptions.

Resources

A Complete 15 Week Curriculum to Master SQL for Data Science

Mode SQL Tutorial | – Mode


2. Data Wrangling / Feature Engineering

Whether you’re building models, exploring new features to build, or performing deep dives, you’ll need to know how to wrangle data.

Data Wrangling means transforming your data from one format to another.

Feature Engineering is a form of data wrangling but specifically refers to extracting features from raw data.

It doesn’t necessarily matter how you manipulate your data, whether you use Python or SQL, but you should be able to manipulate your data however you like (within the parameters of what is possible of course).

Resources

Fundamental Techniques of Feature Engineering for Machine Learning

Discover Feature Engineering, How to Engineer Features and How to Get Good at It – Machine Learning…


3. Version Control

When I say "version control", I’m specifically referring to GitHub and Git. Git is the main version control system used in the world, and GitHub is essentially a cloud-based repository for files and folders.

While Git is not the most intuitive skill to learn at first, it’s essential to know for almost every single coding-related role. Why?

  • It allows you to collaborate and work on projects in parallel with others
  • It keeps track of all versions of your code (in case you need to revert to older versions)

Take the time to learn Git. It will take you far!

Be sure to subscribe here or to my personal newsletter to never miss another article on data science guides, tricks and tips, life lessons, and more!

Resources


4. Storytelling (i.e. Communication)

It’s one thing to build a visually stunning dashboard or an intricate model with over 95% accuracy. BUT if you can’t communicate the value of your projects to others, you won’t get the recognition that you deserve, and ultimately, you won’t be as successful in your career as you should.

Storytelling refers to "how" you communicate your insights and models. Conceptually, if you were to think about a picture book, the insights/models are the pictures and the "storytelling" refers to the narrative that connects all of the pictures.

Storytelling and communication are severely undervalued skills in the tech world. From what I’ve seen in my career, this skill is what separates juniors from seniors and managers.

Resources


5. Regression/Classification

Building regression and classification models, i.e. predictive models, are not something that you’ll always be working on, but it’s something that employers will expect you to know if you’re a data scientist.

Even if it’s not something that you’ll do often, it’s something that you have to be good at because you want to be able to build high-performing models. To give some perspective, in my career so far, I’ve only productionalized TWO Machine Learning models, but they were mission-critical models that had a significant impact on the business.

Therefore, you should have a good understanding of data preparation techniques, boosted algorithms, hyperparameter tuning, and model evaluation metrics.

Resources

All Machine Learning Algorithms You Should Know in 2021

How To Prepare Your Data for Your Machine Learning Model


6. Explanatory Models

There are two types of models that you can build. One is a predictive model, that guesses an outcome based on a number of input variables. Another is an explanatory model, which isn’t used to make predictions but is used to better understand the relationships between the input variables and output variables.

Explanatory models are usually created using regression models. The reason for this is that they provide a lot of useful statistics in understanding the relationships between the variables.

Explanatory models are incredibly undervalued and useful, and are essential if you want to get into the field of decision science.

Resources

Interpreting Results in Explanatory Modeling


7. A/B Testing (Experimentation)

A/B testing is a form of experimentation where you compare two different groups to see which performs better based on a given metric.

A/B testing is arguably the most practical and widely-used statistical concept in the corporate world. Why? A/B testing allows you to compound 100s or 1000s of small improvements, resulting in significant changes and improvements over time.

If you’re interested in the statistical aspect of data science, A/B testing is essential to understand and learn.

Resources

A/B Testing – A complete guide to statistical testing


8. Clustering

Personally, I haven’t had to use clustering in my career, but it’s a core area of data science that everyone should at least be familiar with.

Clustering is useful for a number of reasons. You can find different customer segments, you can use clustering to label un-labeled data, and you can even use clustering to find cutoff points for models.

Below are some resources that go over the most important clustering techniques that you should know.

Resources

The 5 Clustering Algorithms Data Scientists Need to Know

10 Clustering Algorithms With Python – Machine Learning Mastery


9. Recommendation

While I haven’t had to build a recommendation system in my life (yet), it’s one of the most practical applications in data science. Recommendation systems are so powerful because they have the ability to propel revenue and profits. In fact, Amazon claimed to have boosted their sales by 29% due to their recommendation systems in 2019.

And so, if you ever work for a company in which its users have to make choices and there are a lot of options to choose from, recommendation systems might be a useful application to explore.

Resources


10. NLP

NLP, or Natural Language Processing, is a branch of AI that focuses on text and speech. Unlike machine learning, I’d say that NLP is far from maturing, which is what makes it so interesting.

NLP has a lot of use-cases…

  • It can be used for sentiment analysis to see how people feel about a business or a business’ product(s).
  • It can be used to monitor a company’s social media by separating positive and negative comments.
  • NLP is the core behind building chatbots and virtual assistants
  • NLP is also used for text extraction (sifting through documents)

Overall, NLP is a really interesting and useful niche in the data science world.

Resources

10 NLP Techniques Every Data Scientist Should Know


Thanks for Reading!

I hope that this helps guide your learnings and gives you some direction for the upcoming year. There is a lot to learn so I would definitely choose a couple of skills that sound most interesting to you and go from there.

Do keep in mind that this more of an opinionated article that is backed by anecdotal experience, so take what you want from this article. But as always, I wish you the best in your learning endeavors!

Be sure to subscribe here or to my personal newsletter to never miss another article on data science guides, tricks and tips, life lessons, and more!

Not sure what to read next? I’ve picked another article for you:

All Probability Distributions Explained in Six Minutes

and another one!

OVER 100 Data Scientist Interview Questions and Answers!

Terence Shin


Related Articles