Be sure to subscribe here and to my personal newsletter to never miss another article on Data Science guides, tricks and tips, life lessons, and more!
As you approach the holidays, you’ll finally have time to sip on some hot chocolate, sit by the fireplace, and watch hours and hours of Christmas movies.
Or… You can spend your time doing some awesome data science projects! Because what’s a better way to spend your time than adding another personal project to your GitHub repo 😉
And so, I give you 12 data science projects for 12 days of Christmas! Each one can be finished in a day and each will guarantee to teach you valuable and applicable skills.
With that said, let’s dive into it:
1. Python Simulations

Building simulations are not only really cool, but quite relevant with the pandemic! Not only are Python simulations very beneficial to your coding fluency and your understanding of data science, but they are also fun and addictive to play around with.
There are a myriad of scenarios and factors you can simulate, often with less than a couple hundred lines of code. For example, I have an article about simulating a basic pandemic and predicting population control, both of which have code which you’re free to see!
Difficulty: Anywhere from trivial to super complex!
Where to start:
- Need inspiration? Check out my article, "Building Simulations in Python – A Step by Step Walkthrough" and " Simulating the Pandemic in Python"
Skills you’ll learn:
- Object-oriented programming
- Simulating randomness in Python
- Modelling real-life scenarios
2. Retail Analytics

Although it is interesting to simulate the spread of disease or social dynamics, we can find uses of data science and programming in business, too.
Forecasting sales for holidays, like Christmas, is incredibly important for determining how much to produce. Too much and there’s stale inventory. Too little and you’ve lost out on potential revenue.
Below are several resources for you to learn and practice retail sales forecasting.
Difficulty: Intermediate
Where to start:
- Grab this dataset here.
- Watch this video from Analytics University: 10 Data Science Projects in the Retail Industry – YouTube
Skills you’ll learn:
- Predictive modelling, possibly time-series forecasting as well
- Understanding business statistics
3. Covid-19’s Impact on Airport Traffic

In a similar vein, let’s bring together science and business to improve our data science skills with a crucial, real life scenario. In the past nine months, Covid-19 has hugely changed the way we live our lives – particularly it has had a massive impact on worldwide travel. With the dataset below, explore the data, create visualizations, and even see if you can create a prediction model for airport traffic.
Difficulty: Easy
Where to start:
Skills you’ll learn:
- Exploratory Data Analysis
- Data Visualizations
4. Tweetdeck Replica

If you already use Tweetdeck, this project is for you! Tweetdeck is a tool for Twitter that allows you to track your Twitter engagement and a variety of insights in real time. Using the Twitter API and a visualization tool like Dash or Streamlit, you can create a simple web application to create your own analytics platform for Twitter!
Difficulty: Intermediate
Where to start:
- Get familiar with Tweetdeck
- Learn how to engage with APIs and request an API key from Twitter
- Learn about a visualizing tool to deploy your visualizations, like Dash or Streamlit
Skills you’ll learn:
- Working with APIs
- Creating interactive insights and analytics dashboards
5. A/B Testing: Click-Through Rates

Arguably one of the most practical data science concepts in the workplace is A/B Testing. And yet, it is a concept that is quite misunderstood because there are a lot of intricacies to it.
More specifically, determining click-through rates is an extremely metric for any company with a marketing team. By properly measuring click-through rates, you can optimize the appearance, the messaging, and anything else related to your online advertisements.
Difficulty: Intermediate
Where to start:
Skills you’ll learn:
- Exploratory Data Analysis
- How to conduct a proper A/B test for click through rates
6. Recommendation System

The recommendation algorithms used by modern social media platforms and content aggregators are extremely complex and constantly developing. What’s a better way to understand how they work and improve themselves by building one yourself?
Difficulty: Intermediate-Advanced
Where to start:
- Learn the basis of recommendation systems.
- Walkthrough of building a recommendation system.
- _Check out my basic recommendation system on GitHub_
Skills you’ll learn:
- Building Recommendation Systems
- SVD, matrix factorization
7. Trustpilot Webscraper

Learning how to webscrape data is simple to learn and extremely useful! Scraping a customer review website, like Trustpilot, is valuable for a company as it allows them to understand review trends (getting better or worse) and see what customers are saying via NLP.
Difficulty: Easy
Where to start:
- Get familiar with how Trustpilot is organised, and decide upon which kinds of businesses you will analyse
- Walkthrough of how to scrape Trustpilot reviews.
Skills you’ll learn:
- Webscraping data
- Analyzing customer reviews
- Take it further and apply NLP to extract insights from reviews.
8. Customer Segmentation

What do you know, we’ve come full circle back to our challenge on retail analytics! But in this problem, however, the goal is to use statistics to cluster customers into similar groups so that you can identified desired customer segments that you want to market your business to!
Difficulty: Intermediate-Advanced
Where to start:
- Learn about k-means clustering and hierarchical clustering
- Walkthrough on customer segmentation for online retail
- Another walkthrough on customer segmentation for e-commerce
Skills you’ll learn:
- Clustering techniques
- Dimensionality reduction
9. Time Series Forecast on Energy Consumption

This dataset is composed of power consumption data from PJM’s website. PJM is a regional transmission organization in the United States. Using this dataset, see if you can build a time series model to predict energy consumption. In addition to that, see if you can find trends around hours of the day, holiday energy usage, and long term trends!
Difficulty: Medium-Advanced
Where to start:
10. Stocks Predictions

What if you want to predict whether Tesla stocks will shoot to the mooooon. With time series forecasting, you can try to predict the trajectory of a stock. To make it easier, you can use Facebook’s time-series library called Prophet, which does a lot of the heavy lifting for you.
Difficulty: Intermediate
Where to start:
- Pick a public company and get their data from Yahoo Finance
- Walkthrough on time-series modelling using Prophet
Skills you’ll learn:
- More time series knowledge
- Prophet – Facebook’s Time-Series package
11. Instagram – Likes Prediction

Do you have some pictures you want to post to Instagram, but you are not sure which one will get you the most likes or comments? Well, data science can help you with that! You can create a predictive model based around various factors, such as the hashtags you use, the length of your post description, the number of pictures in a carousel, and throw it all together. From there you can test your ideas against this model, observe the outputs, and find the image format that is most likely to get you the most likes! This is a great project to work on if you are interested in Machine Learning, too.
Difficulty: Difficult!
Where to start:
- Don’t push yourself too far on your first version. Just take factors like brightness of image, length of post description, etc., which can be collected through web scraping or Instagram’s API.
- Format these values and use a machine learning or predictive model to map these to how many likes each post got
- From here, scale up by adding in hashtags, time of posting, etc and analysing thousands – or hundreds of thousands of posts – automatically to grow your data set.
- This is a difficult task which can be scaled up indefinitely so don’t be upset if you struggle on your first attempt. It’s why I put this one at the end of the list.
Skills you’ll learn:
- Collecting, cleaning, and manipulating data
- Predictive modelling using machine learning models
12. Resume – Job Application Matcher

The last topic that I wanted to leave a little more open-ended is creating a resume-job description matcher. By using NLP techniques like latent semantic analysis, see if you can determine how close a resume matches a job description.
Where to start:
- Learn more about latent semantic analysis here
- Check out a similar idea related to resumes and job descriptions here.
Skills you’ll learn:
- NLP techniques like latent semantic analysis and/or cosine similarity
- Potentially linear algebra and SVD (singular value decomposition)
Thanks for Reading!
Well, I hope you enjoyed this article about the twelve best data science projects that you can complete in a day or less, to keep you occupied over the winter holidays! If you were inspired by any of these, I strongly recommend that you attempt at least one.
Not sure what to read next? I’ve picked another article for you:
How I’d Learn Data Science if I Could Start Over (2 years in)
Terence Shin
- If you enjoyed this, follow me on Medium for more
- Sign up for my email list here!
- Let’s connect on LinkedIn
- Interested in collaborating? Check out my website.