The State of Data Science and Machine Learning, Part 1: Education, job titles, and skills

Derrick Mwiti
Towards Data Science
4 min readFeb 17, 2018

--

Late last year, Kaggle conducted a massive survey of more than 16,000 data scientists to dive into questions around their education levels, undergraduate majors, job titles, salaries, and much more.

The survey posed nearly 300 questions and provided a truly massive amount of insight into the types of roles and backgrounds working data scientists have today, even though the dataset contained a lot of null values.

In digging through the data, I focused my energy on insights that would be relevant to somebody who is just starting or looking forward to starting a career in data science and machine learning.

In the first of two pieces based on this data, let’s dive into the insights a beginner can draw from this survey, specifically around education level, job title and income.

Let’s start of by doing some basic exploratory data analysis.

We shall explore the data by analyzing the various questions the respondents were asked.The image below shows the initial stages of loading the scientific tools such as numpy, pandas, matplotlib and seaborn as well as loading in the data sets.

Let’s look at the distribution of the age of the people who responded to the survey.

We see that the mean age of the respondents is 32 and that most of the respondents are between the ages of 25–35.

Gender Identity, Locale, and Employment Status

Plotting a count plot of the gender of the persons who participated in the survey show us that the survey was highly male dominated, with more than three times as many men responding compared to women. That reflects trends in the field in general.

It is clear that most of the respondents came from USA and India.

From this we can tell that most of the respondents are employed full time.

Those who were not employed were asked whether they were enrolled in any degree offering institution. The figure below shows how they responded. Most of the ones who responded to this question were enrolled into some institution.

Job Titles, Education, Major and Usefulness of Education

Most of the respondents current job title is Data scientist with software developer/Software Engineer coming in second.

This shows that majority of the respondents have a Master’s degree, and at least a Bachelor’s degree. A significant portion of respondents boast a doctoral degree, but far fewer have no or minimal formal training whatsoever.

Clearly most of the people playing in these field have a major in mathematics, statistics or computer science.

Most of the people surveyed said that university education was very helpful; it is not surprising that most of them have Bachelor’s and Master’s degrees.

While respondents felt their education level could help reflect their skills, the vast majority pointed to actual work experience, their portfolio, and online courses and certifications as being far more critical in demonstrating their technical prowess.

Summary of Findings

Mean age of respondents: 32

Average age: 25–35

Gender distribution: 75 percent male, 25 percent female

Location: USA (4,197), India (2,704), Other (1,023)

Employment status: Full-time (70 percent)

Level of education: Master’s or bachelor’s degree

Most popular undergraduate majors: Mathematics, statistics, or computer science

How can you show your skills best: Actual work experience, portfolio, and online courses and certifications

In the next portion of this article, we’ll dive into the following — stay tuned!

  1. How did you first start your machine learning / data science training?
  2. What programming language would you recommend a new data scientist learn first?
  3. Which tool or technology are you most excited about learning in the next year?
  4. Which ML/DS method are you most excited about learning in the next year?
  5. How long have you been writing code to analyze data?
  6. Where should I look for a job?
  7. The mean salary of data scientists in the US

--

--