Landing a good job in Data Science can be quite a challenging and difficult task. Although data science is rapidly growing, the number of people getting interested in the field or joining in for financial reasons is increasing exponentially.
So, despite the fact that the demand for good data scientists is high, finding a job as a data scientist is extremely difficult. In order to get a job, you will need to stand out among hundreds, if not thousands, of other applicants.
There are many aspects to a good data scientist, some are technical aspects, while others are not. As a data scientist, you need to have a strong that portfolio clearly demonstrates their technical skillset, as well as their soft skills. Most importantly, their portfolio needs to prove that they have a mind hungry for learning.
Data science is a very broad field, the umbrella term "data science" covers many topics. It covers all subfields of Machine Learning, computer version, artificial intelligence, and natural language processing.
Despite that variety of topics, in order to prove your value as a data scientist, you only need to demonstrate your abilities in the core concepts of data science.
This article discusses 4 types of data science projects that can make your portfolio stand out and strengthen your skillset and increase the chances of landing your dream job.
Data Cleaning
As a data scientist, you will probably spend close to 80% of your time cleaning data. You can’t build an efficient and solid model on a not clean and organized dataset.
When you’re cleaning your data, it can take you hours upon hours of research to figure out each column’s purpose in the dataset. Sometimes after hours – and even days – of cleaning, you discover that the dataset you’re analyzing isn’t really suitable for what you’re trying to achieve! Then, you’ll need to start the process all over again.
Cleaning data can be quite a frustrating and daunting task. It is, however, a very essential part of every data science job, and to make it less daunting, you need to practice.
There are datasets out there, that you can use to practice data cleaning. When you’re looking for a good dataset candidate for data cleaning projects, you need to make sure that
- The dataset is spread over multiple files.
- Have a lot of nuances, null values, and many possible cleaning approaches to take.
- Require a good amount of research to fully understand.
- And most importantly, it needs to be as close to a real-life application as possible.
Good cleaning datasets – or as I call them, very messy sets – are often found on websites that collect and aggregate datasets. These kinds of websites collect data from various sources without eating them out. Which makes them a great candidate for cleaning projects.
Examples of such websites are:
Exploratory Data Analysis
Once your data is clean and organized, you will need to perform exploratory data analysis (EDA). EDA is one of the important steps in every data science project. There are many benefits of performing EDA, such as:
- Maximizedataset insights.
- Reveal underlying patterns and structure.
- Extract important inforamtion.
- Detect anomalies.
There are many techniques we can follow to perform an efficient EDA, most of these techniques are graphical in nature. The reason behind that is, it’s better to spot patterns and anomalies in the data when it’s represented visually. The particular graphical techniques use in EDA tasks are really straight forward, for example:
- Plotting the raw data to obtain initial insights.
- Plotting simple statistics on the raw data, such as mean plots and standard deviation plots.
- Focusing the analysis on specific sections of the data for better results.
Data Visualization 101: 7 Steps for Effective Visualizations
There are many sources where you can learn the basics of EDA and develop an intuition for exploring and funding patterns within your data; one of my favorite courses on the topic is the one offered by Johns Hopkins University on Coursera.
Data Visualization
When a data scientist builds any kind of data science project, they are often building it to uncover secrets and information that can help improve or understand the data in some way.
Most of the time, this is done in an academic or a business-oriented manner. One of the skills that every data scientist must develop is the ability to tell a compelling story with their data.
The best way to tell a story is, to visualize it.
There are many publically available datasets that you can use to practice data visualization, building dashboards, and telling a story with your data. Some of my favorite ones include: FiveThirtyEight, Google’s Dataset Search, Data is Plural, and of course we can’t talk about datasets without mentioning Kaggle.
In order to stand out, you need to be a good story-teller. Your data needs to effectively visualize. Luckily, there are many recourses where you can learn and practice your data visualization skills. You can read articles about visualizations, or go through effective visualizations courses.
Machine Learning
One of the things that can make or break your chances of landing a data science job is your machine learning fluency. Sometimes when newcomers join the field, they tend to skip over the basics and jump straight into the more advanced "buzzwords" of the field.
But,
Before you go very deep with such advanced topics, you need to make sure that you’ve built a solid foundation of machine learning basics. Mastering the basics will not only strengthen your skill base but will give you the knowledge necessary to pick up any advance and new concepts faster and with ease.
Make sure to have projects that overs all machine learning basics, such as regression (linear, logistics, etc.), classification algorithms, and clustering. Some of my favorite recourse on machine learning basics are the machine learning basics chapter of The Deep Learning Book, and the CodeAcademy machine learning course.
Here are some simple, yet powerful machine learning project ideas:
- Loan prediction using loan prediction dataset.
- Housing prices prediction using housing price prediction dataset.
- Music genre classification.
- Personality prediction using personality prediction dataset.
- Handwritten character recognition.
- Speech to text or vice-versa.
How to Choose the Right Machine Learning Algorithm for Your Application
Takeaways
Landing a good job in data science can be quite challenging due to the huge pool of applicants and people interested in the field. To stand out among others, your portfolio needs to prove that you have a solid foundation of the basic concepts of data science.
A strong foundation means you will be able to learn, implement, and adapt to new models and algorithms with ease. This article laid out 4 types of data science projects that can help increase your chances of landing your dream job. These 4 types of projects are:
- Data cleaning projects.
- Exploratory data analysis projects.
- Data visualization projects (preferably interactive ones).
- Machine learning projects (clustering, classification, and NLP).
Having these projects will prove that you have a solid foundation in data science. However, having these projects is not enough to get you a job; you need to also work on your soft skills, such as communication, story-telling, and basic business model understanding. As well as have a few advanced projects that show the extent of your knowledge.