The world’s leading publication for data science, AI, and ML professionals.

What is the data science community’s favourite media source?

This is that time of the year when Kaggle shares data from its industry-wide survey on its platform and challenges the Data Scientists all…

[Image by Adeolu Eletu on Unsplash]
[Image by Adeolu Eletu on Unsplash]

This is that time of the year when Kaggle shares data from its industry-wide survey on its platform and challenges the Data Scientists all over the world to analyze that data and present a truly comprehensive view on the current state of Data Science and machine learning. Ergo, as a fellow data scientist, after looking at the dataset, I felt that it will be a nice exercise to know more about the patterns in the community.

In order to make this analysis, a bit more easy to read, I have divided it into a series of articles:

Since medium is a platform where we share our ideas and latest discoveries, it will be great if we start by uncovering some of the most popular sources of information and learning within the community. For the purpose of this article, as discussed above, I have used Kaggle 2020 Survey Data.

So without a further ado, let’s just dive in and find out who is using what for keeping themselves up to date on ML/DS news.

Kaggle, YouTube and Blogs are the most popular source of regular information and learning.

POPULAR MEDIA SOURCES USED BY DATA SCIENCE COMMUNITY

Kaggle Notebooks and its Forums have been the most popular source of information and entertainment for data scientists world over, followed by Youtube and Blogs such as Towards Data Science and Analytics Vidhya.

Fig 1 : Popular Media Sources [Image by Author]
Fig 1 : Popular Media Sources [Image by Author]

Now that we know the most popular sources, let’s dig a bit further and try to explore if this preferences changes with Age, Gender, Region, Education level,Years of experience and Role.

POPULAR MEDIA SOURCES FOR ALL AGES

Blogs as the most popular source of information among late vicenarians, a.k.a data scientists in their late twenties(25–29 years), followed by early vicenarians(22–24 years). Popularity of blogs seems to decrease as people age. This can be attributed to the fact that we have most aspiring and serving data scientists in the age group of 18–40 years.

Fig 2: Popularity of Blogs by Age Group[Image by Author]
Fig 2: Popularity of Blogs by Age Group[Image by Author]

Youtube have been the second most popular source of information among Data Scientists. But let’s see, what age groups are more attracted towards it. The age distribution is more or less similar to Blog’s audience, but with only difference that we have slightly more audience in 30 and above age group who prefer reading blogs rather than watching YouTube.

Fig 3: Popularity of Youtube by Age Group[Image by Author]
Fig 3: Popularity of Youtube by Age Group[Image by Author]

Kaggle, the winner of everyone’s attention does well for all age groups. Data Scientists of all ages trust Kaggle forums for their ultimate source of information and learning.

Fig 4: Popularity of Kaggle by Age Group[Image by Author]
Fig 4: Popularity of Kaggle by Age Group[Image by Author]

DO WOMEN CHOOSE THEIR MEDIA SOURCE DIFFERENTLY?

There have been many historical evidences of battle of sexes. But after analyzing the Kaggle dataset, we find that there is no such battle here. Women and Men all prefer Kaggle, Youtube and Blogs equally to keep themselves updated.

Fig 5: Popularity of Blogs among Sexes[Image by Author]
Fig 5: Popularity of Blogs among Sexes[Image by Author]
Fig 6: Popularity of Youtube among Sexes[Image by Author]
Fig 6: Popularity of Youtube among Sexes[Image by Author]
Fig 7: Popularity of Kaggle among Sexes[Image by Author]
Fig 7: Popularity of Kaggle among Sexes[Image by Author]

You might observe stark difference in count of Men and Women respondents. This may be due to underrepresentation of Women in Data Sciences.

DOES EDUCATION LEVEL, LEVEL UP THE CHOICE OF MEDIA SOURCE ?

All three media sources, be it Kaggle, Youtube or Blogs find a great audience in people with Master’s Degree, followed by Bachelor’s Degree. This can also be attributed to the fact that most of the Data Scientist either hold a Master’s or Bachelor’s Degree.

Fig 8: Popularity of Blogs among Education Level[Image by Author]
Fig 8: Popularity of Blogs among Education Level[Image by Author]

A very interesting observation is that Data Science Burghers with no formal education after High School, college dropouts and professional degree holder’s find Youtube as their favourite source of information and learning.

Fig 9: Popularity of Youtube among Sexes[Image by Author]
Fig 9: Popularity of Youtube among Sexes[Image by Author]
Fig 10: Popularity of Kaggle among Sexes[Image by Author]
Fig 10: Popularity of Kaggle among Sexes[Image by Author]

NATIONALITIES AND PREFERENCE FOR MEDIA SOURCE

When we look at the popularity of a media sources across geographies, we conclude that Kaggle is the most popular source. But there were certain interesting observations such as Blogs are a little less popular in certain regions, particularly in Brazil compared to YouTube and Kaggle.

On the other hands, blogs are the most popular source of information in USA. Nearly, 37% of fellow data scientists rely on blogs.This could be attributed to the popularity of Medium in USA.

Fig 11: Popularity of Blogs across Geographies[Image by Author]
Fig 11: Popularity of Blogs across Geographies[Image by Author]

Youtube is the second most popular source of information across all countries except USA. Other countries such as India, Brazil, Russia, Japan, etc rely on YouTube after Kaggle.

Fig 12: Popularity of YouTube across Geographies [Image by Author]
Fig 12: Popularity of YouTube across Geographies [Image by Author]

Kaggle is the indisputable king of media sources in the data science community. This hold true evenly across geographies except for USA.

Fig 13: Popularity of Kaggle across Geographies [Image by Author]
Fig 13: Popularity of Kaggle across Geographies [Image by Author]

SHOULD MY CURRENT ROLE DEFINE MY CHOICE OF MEDIA SOURCE?

Yet again, Kaggle is the most popular media source across all roles. Albeit, we can observe inclination of people with Data Scientist and Statistician role towards blogs rather than YouTube as their regular source of information.

Fig 14: Popularity of Blogs across Roles [Image by Author]
Fig 14: Popularity of Blogs across Roles [Image by Author]
Fig 15: Popularity of YouTube across Roles [Image by Author]
Fig 15: Popularity of YouTube across Roles [Image by Author]

Kaggle is the most popular source of media for everyone, be it Data Scientist, Research Scientist, Machine Learning or any other role.

Fig 16: Popularity of Kaggle across Roles [Image by Author]
Fig 16: Popularity of Kaggle across Roles [Image by Author]

DOES PREFERENCE FOR MEDIA SOURCE CHANGE WITH PROGRAMMING EXPERIENCE?

Kaggle is the first choice for most of the data scientists. But we see a general trend that across all platforms people engage progressively when they have 0 to 5 years of programming experience . After that, their interactions seems to decrease may due to their shift towards leadership roles.

Fig 17: Popularity of Blogs across Experience[Image by Author]
Fig 17: Popularity of Blogs across Experience[Image by Author]

YouTube is the go to media source for people with zero to 1 year of programming experience.

Fig 18: Popularity of YouTube across Experience [Image by Author]
Fig 18: Popularity of YouTube across Experience [Image by Author]

Kaggle is the favourite media source for data scientists with 1+ years of experience in programming. This can be attributed to the fact that once you know programming you might want to practice on challenges and what could be better than Kaggle.

Fig 19: Popularity of Kaggle across Experience [Image by Author]
Fig 19: Popularity of Kaggle across Experience [Image by Author]

SAMPLE CODE

I will be uploading the entire analysis on my github. Feel free to check that out. For the time being, here is the code for visualizations used in this article.

Code for the first count plot :

Code for the count plots with hue :

Code for geographical plots :

WRAP UP

As per the analysis, we can conclude that Kaggle is the favourite media source in the data science community, followed by YouTube and Blogs. This holds true across geographies, sexes, job roles, age etc. While other players such as Reddit, Email newsletters, Journals etc. are all at almost the same level.

REFERENCES

Beautiful bar plots with matplotlib


Related Articles