Do you need a graduate degree for data science?

Maybe so. Maybe not.

Jeremie Harris
Towards Data Science
8 min readDec 18, 2018

--

I’ll level with you: I’m a PhD dropout.

I’ve gotten a lot of mileage out of that title, by the way: it hints that I’ve done a lot of grad school, but still maintains the aura of badassery that only the word “dropout” can provide. In some ways, it’s the ultimate humble brag. Graduate with a PhD, and you’re one nerd among ten thousand. But drop out 2.5 years into it, and you’re an edgy nerd. People will wonder what other edgy shit you might to do next. “Elon Musk dropped out of grad school,” they’ll say. “This guy could be just like Elon!”

As much good as dropping out of grad school has done for my reputation as an unpredictable, 4-D chess playing, closet nerd genius, it’s become clear to me that two of the key ingredients to becoming a PhD dropout aren’t for everyone: 1) starting a PhD, and 2) dropping out of a PhD. And the same goes for a Master’s. This is true for the average aspiring STEM professional, but it’s even more true for aspiring data scientists. And I’ll get to the reason why in a second.

But first, you might be wondering how I know this.

Well here’s the thing: I work at a data science mentorship startup. And through that work I’ve probably interviewed over a thousand aspiring data scientists — some with PhDs, some with Master’s degrees, some with undergrads, and some who dropped out at every stage in between. And that’s left me with a rare and precious thing: a statistically significant sample of data science career stories.

And here’s what I’ve learned from those stories: there’s a time, a place, and a person for whom different degrees make sense. But because most people turn to university and college graduate advisors to decide whether or not to do grad school, they tend not to get a complete picture of what’s what before they sign up.

So take it from a former academic-turned-startup-founder: not all degrees are for everyone. Here’s why.

1. The PhD

[Warning: this is going to upset a lot of people who have PhDs. I apologize in advance.]

“I’ve seen that a PhD is required for a lot of data science jobs. Do I need a PhD to become a data scientist?”

Oh my god no. Not a thing.

Don’t get me wrong, going through life with those three extra letters in your email sign-off is a definite plus. I sometimes wish I’d stuck around for that reason alone. But then reality sets in.

If your goal is to become a data scientist or machine learning engineer/researcher, a PhD *might* be a good move. But there are also two big reasons why it might not be:

  1. It takes a REALLY long time to get a PhD.
  2. You’re unlikely to learn anything of value unless you get the “right” PhD from the “right” supervisor.

To the first point: in the U.S. or Canada, a PhD takes anywhere from 4 years to 7 or 8 years to complete. The median time to completion is usually 5 or 6 years, depending on the institution. Now let’s put that into perspective.

You know what wasn’t a thing in data science 5 years ago? Spark, XGBoost, jupyter notebooks, GloVe, spaCy, TensorFlow, Keras, Pytorch, InceptionNet, ResNet, reinforcement learning (like, basically at all), etc, etc.

So unless you decided to learn about these things on your own when they came out (in which case, I’m not sure that grad school deserves the credit), there’s a chance your PhD would have left you in the position of someone who was cryogenically frozen in 2012, only to be re-animated today as a complete newbie. You’d find yourself in a brave new world of data science techniques that you’d have to learn on your own after graduation day anyway.

The point is, things move very fast in data science and machine learning. And they’ll only move faster in the future. So if you’re considering a PhD in a data science or machine learning-related field, and your goal is to work in industry some day, just keep in mind that you’re essentially placing a bet on the area you’re specializing in: you’re banking on it being both relevant and in high demand when you finally graduate. And that can be a risky bet, with some pretty high stakes.

To the second point: take a moment to think about who would be supervising you, and why they aren’t already working at Google or Facebook.

Of course, there are people who simply prefer academic research to doing data science or machine learning in industry. But it’s worth keeping in mind that the amount of money offered to top-tier ML talent in industry is high enough that there’s definite downward selective pressure on people who stay in academe.

And of course there are places where you can consistently find exceptions to this rule. These are usually super-elite programs like the Vector Institute or MILA in Canada, or data science programs at MIT and Berkeley in the U.S. You’ll know them when you see them, but just keep in mind that your state college, or that school that’s ranked among the “top 200” in the world, probably won’t be one.

To sum up: if you’re only interested in becoming a deep learning engineer at Airbnb, then sure, a PhD might be one of relatively few ways through the door. But don’t expect to get hired at a platinum+ company if you’re not doing your PhD in a platinum+ program.

But if you’re looking for a more typical (read: more realistic) data science role, a PhD is rarely the right move. You’ll be better off investing those 4 to 8 years into getting work experience as an actual data scientist, where you’ll learn new techniques when they come out, and where you’ll be in a better position to anticipate new trends and stay ahead of them.

Oh, and if you’re considering a PhD in an area that’s not data science-related at all (e.g. physics, biology, chemistry), and you’re aiming for a data science role, here’s a useful yet harsh heuristic: if you’re within 18 months of graduation or more (and you’re really sure you want to be a data scientist), just drop out. The sunk cost fallacy will have you second-guessing this strategy (and you should), but in my (statistically significant) experience it’s much more likely to be the right move than not.

2. The M.Sc.

Do you need a Master’s to do data science?

It depends. Here’s a scorecard I just made up. Add up the points that apply to you, and if the total is greater than 6, then the answer is “probably a Master’s will be helpful.”

  • You have a “hard” STEM background (physics/math/CS undergrad or other degree/diploma): 0 points
  • You have a “soft” STEM background (biology/biochemistry/economics undergrad or other degree/diploma): 2 points
  • You have a non-STEM background: 5 points
  • You have less than 1 year of experience working with Python: 3 points
  • You’ve never had a job that involves coding: 3 points
  • You don’t think you’re good at independent learning: 4 points
  • You don’t understand what I mean when I say that this scorecard is basically a logistic regression algorithm: 1 point

Caveats:

→ Something to think about is whether you need a full Master’s in data science, or a bootcamp. If you choose to do a bootcamp, keep their incentives in mind: are you being asked to pay upfront without the guarantee of getting hired afterwards? Is there a careers service associated with the bootcamp?

→ Most people are skeptical of bootcamps. Rightly so. But what most people miss is that they should be equally skeptical of any university Master’s degree that doesn’t provide a guarantee of placement. Master’s degrees are bootcamps. Treat them that way. If you do one don’t focus on your grades, focus on what you’re learning. Ask what the postgraduate employment rates look like for your program. Universities have a funny way of convincing their students that an easy program is a good program, or that they’re doing you a favor just by letting you in. This is a psychological game, and one that’s reinforced by the outdated “conventional wisdom” that university degrees have independent value. But your goal is to get hired, not to “put in your time” and get a piece of paper.

→ Even if you complete a Master’s, you’ll have a lot of skills polishing to do. And probably quite a bit more than you expect. But as long as the program is short (NEVER to a 2-year + Master’s program), and the price tag isn’t too high, it can definitely be worth it.

3. The undergrad

In general, yes, you’ll need an undergrad degree to become a data scientist. Not necessarily because you need the knowledge, but because companies aren’t yet ready to accept the idea that being self-taught + doing a bootcamp and some online courses can actually get you job-ready (even though in some cases it absolutely can).

But here’s the thing with undergraduate degrees.

They’re not jobs. And if you talk to just about anyone in tech, you’ll quickly realize that tech jobs >> school for learning about tech. That’s partly because the curricula that are taught in undergrad are generally about 5 to 10 years out of date. And that can be fine if you’re in a field that doesn’t change a lot, like physics, math or statistics.

But if you’re in engineering or CS, and you get a summer internship at a great company, and you want to defer your degree (or drop out) to get more work experience, that’s something you should 100% consider doing. If the point of your undergrad is to get a job, there’s little purpose in paying more tuition to wrap up if you’ve already secured a position at a company that has enough runway to get you your first 2 years of experience.

Now I’m absolutely not saying that you should just drop out of your undergrad. All I’m saying is that more people should in general be more open to leaving their degrees unfinished *if* they’ve done an internship and that’s converted into a concrete offer of full-time work. It doesn’t happen terribly often, but I suspect that’s in no small part due to the fact that so many undergrads just assume that completing your degree is “what good people do.”

The advice I’ve given here is unconventional in many ways. But in a rapidly developing field like data science, convention can often lag considerably behind what’s optimal. As a society, our perception of the value of graduate education is one of the aspects of conventional wisdom that’s most badly in need of catching up to reality.

None of this means that formal education, or even graduate degrees aren’t worth obtaining, of course. But no one should take the need for a Master’s or a PhD for granted: if you’re signing up for a M.Sc. just to fit your stereotype of what a good data science career trajectory should look like, you might want to rethink your strategy.

--

--

Co-founder of Gladstone AI 🤖 an AI safety company. Author of Quantum Mechanics Made Me Do It (preorder: shorturl.at/jtMN0).