Opinion
If you Google the best major or degree for Data Science, you’ll find a wide variety of recommendations including math, statistics, and computer science. Over my decade-long career as a Data Scientist, I’ve found that no single major or degree spans all the skills necessary to excel in the field of data science. My math degree has absolutely helped me understand the inner workings of scikit-learn’s algorithms, but in order to write the code to deploy those algorithms into production, I’ve had to deeply expand my knowledge of computer science.
While math, statistics, and computer science teach Data Scientists the technical skills they need to do their job, I’ve found that soft skills are equally as important and not emphasized enough in those fields. To help improve my soft skills, I’ve found the areas of psychology and business to be extremely helpful.
For aspiring Data Scientists, I recommend you consider a major and/or minor in any of the areas in this article, and for current Data Scientists, let these areas and resources drive your learning.
Computer Science
Let’s start with one of the least surprising areas – computer science. Many of the Data Scientists I know have studied hard sciences like computer science, physics, or engineering. If you come from a field other than computer science, I think this is one of the most valuable areas for Data Scientists to expand their knowledge.
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician. – Josh Wills
As Josh Wills so eloquently puts it, software engineering is a fundamental competency for successful Data Scientists. You may be lucky enough to have a Machine Learning Engineer to productionalize your Jupyter Notebook for you; however, the reality for many Data Scientists is that they’ll need to write and deploy production-level code. And you don’t learn how to do that in your math classes.
If you’re unfamiliar with the wide range of topics in the field of computer science, you might check out the following resources.
- The Pragmatic Programmer by David Thomas and Andrew Hunt
- Algorithms to Live By by Brian Christian and Tom Griffiths
- Martin Fowler’s blog
Math and Statistics
On the other side of Josh Wills’ definition of Data Scientists are math and statistics. Frameworks like scikit-learn, PyTorch, and TensorFlow make it simple to create complex models, but behind every model is an algorithm using math to make decisions. Knowledge of math and statistics is non-negotiable for Data Scientists. Without that knowledge, you likely won’t understand why your model performs exceptionally well or poorly.
Once you have a solid understanding of the basics (calculus, linear algebra, and probability theory to name a few), I recommend you check out the following resources.
- Forecasting Principles and Practice by Rob J Hyndman and George Athanasopoulos
- How Not to Be Wrong: The Power of Mathematical Thinking by Jordan Ellenberg
- Thinking in Bets by Annie Duke
- The Signal and the Noise by Nate Silver
- Fooled by Randomness by Nassim Nicholas Taleb
- Antifragile by Nassim Nicholas Taleb
Psychology
So let’s assume you already have a strong technical background and want to expand beyond computer science, math, and statistics. A field that I’ve found myself frequently pulling from is psychology.
On the surface, data science and psychology might not seem related; however, when you remove math and computer science, data science at its core helps individuals and organizations make faster, better, and cheaper decisions. We can pull from years of research on the psychology behind the decision-making process to build better machine learning systems.
Humans are at the core of every machine learning system. They generate the data used to train our models, consume the output of our models, and ultimately change their behavior as a result of the decisions that our models make (for better or for worse). While my degree program focused on which algorithms to use, I’ve found that it’s even more important to focus on the people interacting with those algorithms.
I’ve often found myself leaning into the field of psychology to solve problems in my day-to-day data science work. Take for example the process of obtaining a labeled data set. For many of the most interesting problems, there isn’t a perfectly labeled data set for what you are trying to accomplish. Before we can build a model, we’re dependent on people to annotate data for us.
But crafting the best labeling task for our annotators is more complicated than it seems. Let’s say I want my annotators to label data for a multi-class classification problem. Did you know there’s a limit on the number of classes a human can distinguish between? What if I need my annotators to perform two labeling tasks for my data? Will the first task bias the result of the second task? When I receive my annotations, how will I handle annotator disagreement and bias?
These are just a few examples of important questions to ask before we even touch any machine learning models. The field of psychology has been collecting data on humans for longer than any other profession that I’m aware of, and there is some crucial knowledge that the field of data science could benefit from. All of the questions above are addressed in the field of psychology in one way or another – we just need to apply the findings.
If you’re looking to get your feet wet in the world of psychology, here are a few of my favorite books.
Business
Last, but certainly not least, let’s discuss the world of business. Understanding how businesses function and operate is key because they are usually the ones paying you as a Data Scientist to do your job. The knowledge of what your business does, how it does it, and how you fit into the overall picture is something you will need to learn quickly to become an effective Data Scientist.
After learning this, I’ve found it helpful to learn more about specific positions that specialize in understanding a "customer’s" problem. I generally think of my employer as my internal customer and my employer’s customer as my external customer. In either case, I want to become an expert in the problems that both are facing. To this end, Product Managers are often the individuals who are experts in the external customer’s problem.
Meeting with Product Managers and researching more about their field has helped me better understand my customers and create machine learning systems that fully meet their needs. As you begin to develop your skills in this area, you may realize there is a fair amount of overlap between product management and data science. Product Managers need a basic understanding of statistics to comprehend the A/B tests their product teams run and also need to be able to evaluate customer data. In addition, it’s also helpful for them to have some user experience knowledge which brings us back to the world of psychology.
Learning more about product management, general business, or anything in between will set you up for success as a Data Scientist. A few resources I recommend to kick-start your learning are listed below.
- The Product Book by Josh Anon and Carlos González de Villaumbrosia
- User Experience Customer Journey Mapping
Approach to Learning
So how do you attempt to learn about a field that’s completely new to you? Formal Education, like degree programs through universities, can be a great way to learn foundational knowledge in a new field, but it usually doesn’t equip you with everything you need to know on the job.
One of my favorite ways to learn a new field is through project-based learning. It’s much easier to sink my teeth into a new topic or area if I have some kind of project to reinforce my learning. Even more helpful is if the project is something I care about and not a random project in someone’s Github.
If you’re currently a Data Scientist, you should already have a project that will help direct your learning. If not, you may want to take a workshop that teaches you to ideate, design, and build a project.
I hope the resources that I’ve provided throughout this article will serve as a solid starting point for your learning. But for those that are better read than I am, leave a comment with your favorite books, blogs, podcasts, or videos about any of the fields I mentioned in this article. I would love to learn from you!
Conclusion
Specialization is what many individuals strive for throughout their careers. When you become an expert in a single area, you are usually rewarded. However, data science challenges us to become more generalists. Broadening your understanding of the fields I mentioned above (and many more!) can strengthen you in your career. You never know what field you may be diving into next to build a better machine learning model.