From Academic to Data Scientist in Under a Year: Part I

Anna H. Casey, Ph.D.
Towards Data Science
5 min readDec 11, 2020

--

Photo by Cookie the Pom on Unsplash

At the end of last year, I was leaving a dead-end post-doc doing research in a lab, disillusioned with academia and wanting something more meaningful and challenging. One week ago, I was offered my first-choice, dream job as a Data Scientist at a DS-for-social-good non-profit. How did I go from a Ph.D. in a non-related field to landing a competitive Data Scientist role in under a year spending less than $250? The short answer is hard work. The long answer is below.

Step 1: Build a solid mathematical foundation. If you’re in academia, there’s a good chance you’ve already had a stats course or two. Because my Ph.D. was in experimental psychology, I had a number of stats courses under my belt. If you don’t, though, I’d recommend taking the Statistics & Probability course through Khan Academy. You’ll also find at least one linear algebra course immensely helpful.

Cost: Free

Step 2: Learn Python. If you’ve never worked with any programming languages before, you’re going to want to do at least one general introduction to Python. I used Code Academy. Don’t get discouraged during this step. I’ve had friends tell me they gave up trying to learn to code because they could never figure out what they were doing wrong. In my experience, at least 80% of writing code is figuring out what you’re doing wrong. Most people who love coding enjoy the problem-solving process. You’re in academia, so I’m going to assume you do, too. If you don’t, then you will likely not only hate coding, but data science generally. I’d recommend a different career track.

Cost: $39.99/month

Step 3: Make a Github. I wish I’d had someone to tell me to do this early. During Step 4, you are going to be doing a lot of data science projects. Learn from my mistake: put them in a Github repo, comment your code, add READMEs that explain your projects, and just generally make sure they’re neat enough that you’d feel good about showing them to a prospective employer. This is going to be your portfolio.

Cost: Free

Step 4: Science that data. I’m about to write a short love letter to Dataquest.io. I promise they have not paid me to do this, and, in fact, have no idea I exist (maybe once this article is published). Dataquest is where I spent the majority of my time learning data science. It has a pretty comprehensive curriculum that covers SQL, Git & version control, and command line. They also review stats, probability, and linear algebra, in addition to teaching Python for data science. They make you learn to do many of the popular machine learning algorithms by hand before ever using libraries, and that was immensely helpful for getting a deeper understanding (which came in handy in the interview process during tech screens). If you do the full Data Scientist in Python track, you will end up with at least 20 complete data science projects in your Github.

Cost: $49/month

Step 5: Demonstrate your understanding. For this last step, you’re going to want to get your hands on some real data. Reach out to non-profits and small businesses in your area and offer your services as a data scientist free of charge. Most of these organizations do not realize they have useful data or understand what can be done with it, so you might need to hold their hands a bit and explain what counts as data and how you can use it. I’d also look at Data is Plural’s archives for interesting datasets and try to come up with a project from those. Avoid Kaggle, because the datasets there are clean and overused. Working with a couple of datasets outside of the ones Dataquest provides will both look better in your portfolio and help you learn to troubleshoot on your own. If you haven’t discovered Stack Overflow by this point in your journey (how is that possible?), you will need it here (and for the rest of your career). A word of caution: if you end up working with a non-profit or small business, make sure you ask them what info they’re comfortable with you including in your portfolio. They may ask you to keep some data private.

Cost: Who knows? Maybe your local small business will throw you a few bucks!

Some notes on workflow and methodology: I did not have a job, so I was able to devote myself full time to learning data science (meaning I spent at least 40 hours per week on my courses and projects). However, I also did not start until mid-April, and I was finished with my coursework by the end of October, so even if you devote half as much time per week as I did (for example, 2 hours a day on weekdays and 5 hours each on Saturday and Sunday) you should be able to complete everything in a year. I worked in sprints of 55 minutes with a 15-min break in between each one. This may seem like a long break, but when you’re learning something new and complicated, your brain needs a lot of rest time to really digest. Better to take breaks, and learn efficiently during sprints, than bang your head against the wall of a steep learning curve. That said, when doing projects, I would frequently get in the flow and completely lose track of time, and I think it can be helpful to ride that wave while you can. Just check in with yourself and make sure you take breaks when you are finding the troubleshooting process to be more difficult than usual.

Once you’re at a point where you’ve done at least one project with real-world data, you’ll be ready to start applying to jobs. Teaching yourself everything you need to get your first job in data science takes self-discipline, diligence, and patience, but it doesn’t have to take decades and thousands of dollars. How you apply, interview, and complete tech screens will be just as important as the preparation you’ve done so far, so I’ll be covering that in Part II.

--

--