The world’s leading publication for data science, AI, and ML professionals.

Five of the Best Data Science Projects from 2021

A Curated List of the Best Kaggle Notebooks from 2021

Be sure to subscribe here or to my exclusive newsletter to never miss another article on Data Science guides, tricks and tips, life lessons, and more!

Introduction

I love Kaggle.

When I started my journey learning data science, Kaggle was one of the main resources that accelerated my learning. Not only does Kaggle have great beginner tutorials, but they also have great CODE that you can learn from!

In the same way that you can get better at a sport simply by observing professionals play, you can significantly improve your data science skills by going through other data scientists’ analyses and code.

Why You Should Take Advantage of Kaggle

There’s a lot of benefits to going through other people’s code:

  1. You can learn about new packages, methods, and techniques that you previously didn’t know about
  2. You can learn about better approaches than what you currently know, like strong machine learning models or faster ways to iterate through millions of rows.
  3. You can learn how to approach a particular problem, whether it be dealing with high-dimensional data, image processing, or conducting time-series analyses.

Believe it or not, Kaggle is one of the first places that I check when I’m faced with a new problem that I’m not 100% sure how to approach. And that’s why I highly recommend that you take the time to go through this curated list of Kaggle Notebooks.

So without further ado, here are the five of the best Kaggle Notebooks that I’ve seen in 2021:


Be sure to subscribe here or to my exclusive newsletter to never miss another article on data science guides, tricks and tips, life lessons, and more!


1. Does Hosting the Olympics Improve Performance?

Photo by Matt Lee on Unsplash
Photo by Matt Lee on Unsplash

Does hosting the Olympics improve performance?

This analysis (link above) was conducted by Josh, a Data Scientist at AWS. He wanted to see whether there’s a statistically significant difference in medal count when a country hosts the Olympics vs doesn’t host the Olympics.

There’s a reason why this analysis is first on the list. In fact, there are two reasons:

One, I found the question itself quite inquisitive. At the back of my mind, I always had a feeling that the answer was "yes", but I never knew for sure. To see someone actually take the initiative to find this out is inspiring!

And two, the analysis itself is brilliant. While it’s still a work in progress, the methodology is sound, the process makes sense, the visualizations are beautiful.

In this analysis, you’ll see Josh answer several questions including:

  • Which countries have won the most medals at the Olympics?
  • How do countries perform when they host the Olympics vs when they don’t? Is there a statistically significant difference between the two?
  • What does the distribution of medals look like?

Be sure to subscribe here or to my exclusive newsletter to never miss another article on data science guides, tricks and tips, life lessons, and more!


2. Netflix Data Visualizations and Trends

Photo by Thibault Penin on Unsplash
Photo by Thibault Penin on Unsplash

Netflix Data Visualization

This next analysis was also conducted by Josh – what can I say, his analyses are incredible! This analysis has several unique data visualizations related to Netflix content, and I’d say they’re quite insightful!

Some of the questions he answers include:

  • Which countries watch the most Netflix?
  • What is the ratio between movies and TV shows?
  • How much content is added to Netflix each year?
  • What age groups does Netflix target?

3. Data Science and STEM Salaries

Photo by Sharon McCutcheon on Unsplash
Photo by Sharon McCutcheon on Unsplash

Salary Data EDA

This analysis was conducted by Jack Ogozaly, and it looks at salary data for data scientists and other STEM-related professions. The data that was used for this analysis was taken from levels.fyi.

I enjoy any type of analysis related to STEM salaries because I feel like it’s always good to know what the going rate is for your job (or desired job). Also, this analysis in particular includes information that you wouldn’t normally see on other websites, like the average salary by race.

Some of the questions you can expect to be answered include:

  • How does compensation change based on years of experience?
  • What are the most common STEM jobs?
  • What level of Education do most data scientists have?
  • Which companies pay the highest salaries for data scientist roles?

4. Reddit EDA and Text Analysis

Photo by Brett Jordan on Unsplash
Photo by Brett Jordan on Unsplash

Reddit Vaccine Myths EDA and Text Analysis

This analysis was conducted by Kheirallah Samaha, and he conducted a text analysis using NLP techniques to determine what the average sentiment was around vaccines over the past few years from Reddit posts. He also ends the analysis with some final reasons as to why many people hesitate to get a vaccine.

I really enjoyed this analysis because it leveraged network graphs and NLP techniques very well. Some questions you can expect him to answer include:

  • What words are the most used and the most correlated with each other?
  • How has the level of positive or negative sentiment changed over time? Did it change significantly during and after COVID-19?

Be sure to subscribe here or to my exclusive newsletter to never miss another article on data science guides, tricks and tips, life lessons, and more!


5. 2021 Happiness and Population Analysis

Photo by Larm Rmah on Unsplash
Photo by Larm Rmah on Unsplash

[Awesome EDA] 2021 Happiness & Population

Finally, this last analysis was also conducted by Josh. Particularly, this is an exploratory data analysis to determine what factors make a country happy or unhappy. The data that he uses is taken from the 2021 World Happiness Report.

Some questions that you can expect to have answered include:

  • Which countries are the happiest? The most unhappy?
  • How do country scores change over time? Are they relatively consistent?
  • What factors are directly related to a country’s happiness score?

Thanks for Reading!

If you enjoyed this be sure to subscribe here or to my exclusive newsletter to never miss another article on data science guides, tricks and tips, life lessons, and more!

I hope you found this useful! Take the time to not only go through the insights but also how the insights were coded. I genuinely believe that that is one of the best ways to accelerate your learning.

As always I wish you the best in your learning endeavors! 🙂


Not sure what to read next? I’ve picked another article for you:

The 10 Best Data Visualizations of 2021

and another one:

All Machine Learning Algorithms You Should Know in 2022

Terence Shin


Related Articles