The world’s leading publication for data science, AI, and ML professionals.

Five Ways to Get Real-Life Data Science Experience Even If You Have No Experience

Develop your skills and get valuable experience

Photo by Vidar Nordli-Mathisen on Unsplash
Photo by Vidar Nordli-Mathisen on Unsplash

Introduction

I’m sure you’ve experienced the dreadful loop,

"I can’t get a job because I don’t have experience because I can’t get a job because I don’t have experience…"

On top of that, the pandemic has only made the job market more competitive than it already is. So how can you get any sort of data-related experience and stand out from the rest of the crowd?

In this article, I’m going to go through five productive and valuable ways that you can get real Data Science experience. I chose these five in particular because they will help you gain practical experience and they will also allow you to pad your resume (yes, we live in a world where that’s important).

With that said, let’s dive into it!


1. Take Advantage of Non-Profit Opportunities

It wasn’t until recently that I found out that there are MANY non-profit organizations that recruit volunteers to support data science projects for social causes.

Below are several organizations that I recommend checking out and signing up for:

  • Statistics Without Borders organizes and connects statisticians and data scientists with international non-profit organizations. Some notable organizations that they’ve worked with are UNICEF and United Nations.
  • Catchafire is another well-known organization that matches professionals who want to donate their time with nonprofits who need their skills. They have a ton of data and analytics opportunities, which you can find if you search "data".
  • The last one that I wanted to share is Solve for Good, which is a platform for non-profits to post projects specifically for data science! All you need to do is sign up and apply for projects that you’re interested in.

As I initially said, these are great opportunities because they’ll give you practical real-world experience, and they will also buff your resume!


2. SQL Case Studies

If you want to be a data scientist, you have to have strong SQL skills. Mode provides three practical SQL case studies that simulate real-life business problems, as well as an online SQL editor where you can write and run queries.

To open Mode’s SQL editor, go to this link and click on the hyperlink where it says ‘Open another window to Mode’.

Learning SQL

If you’re new to SQL, I would first start with Mode’s SQL tutorials where you can learn basic, intermediate, and advanced SQL techniques. Feel free to skip this if you already have a good understanding of SQL.

Case Study 1: Investigating a Drop in User Engagement

Link to the case.

The objective of this case is to determine the cause for a drop in user engagement for Yammer’s project. Before diving into the data, you should read the overview of what Yammer does here. There are 4 tables that you should work with.

The link to the case will provide you with much more detail pertaining to the problem, the data, and the questions that should be answered.

Check out how I approached this case study here if you’d like guidance.

Case Study 2: Understanding Search Functionality

Link to the case.

This case is more focused on product analytics. Here, you’ll be required to dive into the data and determine whether the user experience is good or bad. What makes this case interesting is that it’s up to you to determine what ‘good’ and ‘bad’ mean and how the user experience will be evaluated.

Case Study 3: Validating A/B Test Results

Link to the case.

One of the most practical data science applications is performing A/B tests. In this case study, you’ll dive into the results of an A/B test where there was a 50% difference between the control and treatment groups. Your task for this case is to validate or invalidate the results after a thorough analysis.


3. Personal Data Science Projects

One of the best ways to get data science experience is by creating your own machine learning models. This means finding a public dataset, defining a problem, and solving the problem with machine learning.

Kaggle is one of the world’s largest data science communities with hundreds of datasets that you can choose from. Below are a couple of ideas that you can use to get started.

World University Rankings

Link to dataset here.

Photo by Vasily Koloda on Unsplash
Photo by Vasily Koloda on Unsplash

Do you think your country has the best university in the world? What does it mean to be the ‘best’ university to start with? This dataset contains three global university rankings. Using this data, see if you can answer the following questions:

  • What countries are the top universities in?
  • What are the main factors that determine one’s world ranking?

Detecting Credit Card Fraud

Link to dataset here.

Photo by rupixen.com on Unsplash
Photo by rupixen.com on Unsplash

This dataset presents transactions that occurred in two days, with 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. Learn how to work with unbalanced datasets and build a credit card fraud detection model.

Loan Prediction Forecast

Link to dataset here.

Photo by Dmitry Demidko on Unsplash
Photo by Dmitry Demidko on Unsplash

Taken from Analytics Vidhya, this dataset has 615 rows and 13 columns on past loans that have and haven’t been approved. See if you can create a model that predicts whether a loan will get approved or not.


4. Pandas Practice Problems

When I first started developing machine learning models, I found that my lack of Pandas skills was a big limitation to what I could do. Unfortunately, there aren’t many resources on the internet that allow you to practice your Pandas skills, unlike Python and SQL…

A few weeks ago, however, I came across this resource – this is a repository full of practice problems specifically for Pandas. By completing these practice problems, you’ll know how to:

  • Filter and sort your data
  • Group and aggregate data
  • Use .apply() to manipulate data
  • Merge datasets
  • And much more.

If you can complete these practice problems, you should be able to confidently say that you know how to use Pandas for data science projects. It will also help you out significantly for the next section.

5. Be a Kaggle Grandmaster

If you don’t know what Kaggle is, I highly recommend that you take the time to explore it and see what it has to offer. In my opinion, Kaggle for Data Scientists is like Leetcode for Software Engineers.

Kaggle allows you to showcase your data science projects, your underlying code, and how active you are! There are three main ways that you can be a Kaggle Grandmaster:

A) Compete in competitions

In my opinion, there’s no better way of showing that you’re ready for a data science job than to showcase your code through competitions. Kaggle hosts a variety of competitions that involves building a model to optimize a certain metric.

Two competitions that you can try right now are:

  1. Titanic: Machine Learning from Disaster
  2. House Prices: Advanced Regression Techniques

B) Create and share datasets

In order to be a good data scientist, you have to have good data to start with! Creating datasets through web scraping or other means and sharing these datasets with the rest of the community is a great way to practice supplying clean and usable data to use.

C) Conduct EDA and build models to share with others

Perhaps the best part of Kaggle is that there are thousands of datasets for you to explore and build models. Not too long ago, to give an example, I build an extremely simple recommendation system for cooking recipes using pairwise association. I also took advantage of one of the coronavirus datasets to see how the spread of COVID-19 evolved since the beginning of the year (check it out here.)


Thanks for Reading!

I hope that you find these resources and ideas helpful in your data science journey!

Not sure what to read next? I’ve picked another article for you:

A Complete 52 Week Curriculum to Become a Data Scientist in 2021

and another one!

How I’d Learn Data Science if I Could Start Over (2 years in)

Terence Shin


Related Articles