
You can find the codes for this project here
While software development is still the programming job of choice in the country, Data Science‘s popularity in Nigeria has continually increased over the years as numerous Nigerians find themselves captivated by its potential and gainfully engaged in the space.
However, there is still much to be done as there is a marked under-utilization of modern tools and new improvements in data science research, either due to a lack of knowledge or a fear of risk, Data Science’s impact on the Nigerian market is not as widespread as it should be or how it would be once the inherent power in the continuously generated data is harnessed.
Nevertheless, there is a lot to look forward to as the number of individuals and businesses interested in the science increase – 5 years Google Trend for Data Science, – the number of initiatives catered to train Nigerians increase as well. With companies like Data Science Nigeria, Hamoye, and Utiva dedicating time and resources to educate young and old Nigerians on Advanced Analytics, Data Science, and Artificial Intelligence tools and methods, predicting a rising tide is a safe bet.
In this report, we will take a look at Kaggle‘s annual Data Science and Machine Learning Survey, exploring first the information on Nigerians; their programming tools, ‘stack’, where they learn, consume content, and their companies’ utilization of Data Science while comparing with the information from the rest of the world.
Insights
- The young, young and younger: Our participants are mostly below 40, with a Nigerian sample that is even younger.
- Here be Students! : We have a high proportion of Students; Bachelors & Masters mostly, Nigerian sample had mostly Bachelors.
- Educated, Educated! : Overall with an over 50% proportion of Bachelor degrees, Nigeria edges out the rest of the world in terms of traditional university education (Bachelor’s, Master’s, Doctorate).
- As experience goes up, proportion goes down : Our survey had more beginners in coding and machine Learning with a linear descent visualized.
- You go to university for that?! A substantially lesser proportion of Nigerians learn data science from Universities when compared to the rest of the world.
- Data Science Journals are boring, Where’s the Twitter thread? : When consuming data science content, Journals, Reddit and Podcasts were the least favored in our Nigerian sample
- Who is the greatest Dashboarder of them all? : For deployment/sharing tools, more Nigerians preferred Streamlit to Shiny or Plotly Dash, this preference was reversed in the rest of the world
- Underutilization of Machine Learning methods : A larger percentage of Nigerian businesses don’t use machine learning at all, very few have a model in production.
To start with, we look at the number of participants for the survey from 2017 till now

Nigerians have increasingly shown up for Kaggle surveys, in terms of proportion ranking, they held the 39th position in 2017 while in 2020, they hold the 8th place with numbers (476) that would have earned them the 6th position in 2017.
Compared to the rest of the world as a whole, there has been no dip in the growth of Nigerian respondents, as the number of Nigerians peaked instead in recent times with more than 6 times the number of Nigerian participants in 2017.
Demographics of the Participants
To better understand the diverse individuals who participated in the survey, we need to investigate their similarities and how they can be described as a whole.
Age group
Here, we plotted distribution charts to explore the age groups in the Kaggle survey with both Nigeria and the Rest of the world, for easy comparison and analysis between the samples.

Kaggle as a whole, if this survey can be taken to be representative, has a population that is heavily skewed towards younger people with 80% or more respondents positing to be below 40.
This is especially true in the Nigerian sample, as most of their numbers fall below 35 years of age, with more people between 22 to 24 years than any other age group, this is different from the Rest of the World which has its highest proportion of people in the 25 to 29 years age group.
Taking a step back, Nigeria, all being said, is well known for its young population with over 90% of its 200 million+ population below 55 years – CIA factbook –
Gender
The empowerment of discriminated genders and the elimination of gender gaps is a societal need that all well-meaning individuals should seek to alleviate. Here, we look at the gender distribution in the survey.
📍 _"Non-binary", "I prefer to self disclose" and "I would rather not say" are grouped into "Others" due to their low proportio_n 📍

As could be expected or predicted, there is a heavy imbalance in the proportion of each gender to Kaggle’s survey, with men having clear dominance in this space. This dominance can be said to be representative of the ‘STEM’ world and the larger problem of societal gender gaps.
With men in Nigeria having a larger percentage (82.4%) than their counterparts in the Rest of the World (78.7%), the problem is reiterated, largely justifying more underrepresented in tech schemes, women in tech, women in data science like this Digital Explorers project and similar ones targeting women and other underrepresented genders in Nigeria.
Occupation
Apart from the age and gender variables, an occupational analysis is important in understanding the Nigerian Kaggle population and drilling into which segments are engaged on the platform.

A significantly large population of ‘Kagglers’ as a whole, identify as Students, this is a sneak-peak at the largest use case for Kaggle, with a number of learning initiatives advising opening a Kaggle account and engaging with the content, however, it could also be representative of the amount of young new entrants there are into Data Science yearly, further analysis could be helpful.
For both samples; in Nigeria & the Rest of the World, the top six job roles are ‘Student’, ‘Data Scientist’, ‘Data Analyst’, ‘Software Engineer’, ‘Currently not employed’, and ‘Other’ (Unlisted Data or Non-Data positions) in different orders.
Kaggle can be described as a data science community with datasets, competitions, notebooks, and courses geared towards Data science, therefore, we can say that all occupations visualized here have an interest in or benefit from data science. With this in mind, the 5.7 percent Software Engineer proportion in Nigeria compared to an almost double 10.3 percentage for the Rest of the World can point to a lesser interest in data science by Nigerian software engineers or a sharper divide between data tasks that bears looking at later on.
There is a higher percentage of Nigerian Data Scientists which is balanced by a lower percentage in other listed data science roles, representative of a data science space that hasn’t settled yet on the optimal distribution of labor and work roles, leaving individuals working on end to end data science projects solely. It is worthy to note that the second largest segment of Nigerians in this survey identify as currently unemployed, a worrying indication of the environment or industry.
Educational Level
To be successful in data science, there is a need for a myriad of domain knowledge, as well as an understanding of what tools are needed, why specifically they are needed, and their applications. Most job descriptions in the space include educational requirements, most times a graduate degree in Computer Science or a statistical field.
While there is a lot of encouragement for self-paced learners in the form of popular examples of individuals who have gone through the process, the Data Science space still requires a large educational base.
![SCREENSHOT [Google search job listing for Nigerian Data Science Jobs]](https://towardsdatascience.com/wp-content/uploads/2021/02/1qrzDBnSbaO7CPAkNnfLXVQ.png)
![SCREENSHOT [Indeed job search for Data Science Jobs]](https://towardsdatascience.com/wp-content/uploads/2021/02/1YBJJXcBmr3CLZcg8yU4xxA.png)

The Nigerian respondents are seen to have a larger percentage of Bachelor degree holders followed by Master degrees, the case is reversed with the Rest of the World as Master level graduates are the largest segment as a whole.
Overall with an over 50% proportion of Bachelor degrees, Nigeria edges out the rest of the world in terms of traditional university education (Bachelor’s, Master’s, Doctorate). However, this lead is lost when only postgraduate studies are considered.
There are also a lesser proportion of individuals with no formal education past high school, which might be indicative of the less than average chance of such an individual studying/working in Data Science or an indicator of more effective federal school initiative, even cultural pressure Why are Nigerians the most educated (immigrant) group in America
Going back to our previous insight on ‘Students’ being a large segment of the Kaggle population. It would be beneficial to go further with connecting the job role categories with the varied educational levels, in order to drill into the Student segment, investigating which educational level they are students of, as well as other insights.

From the heatmap plotted, we can see that the high proportion of Bachelor’s degree holders in Nigeria allows for a large presence, 50% or more, in all listed job roles with a few exceptions. Looking at those exceptions, the DBA/Database Engineer role had only one Nigerian respondent but they had a Master’s degree, the Data Engineer, Research Scientist, and Statistician roles all had more Masters degrees than Bachelors as well as the ‘Others’ category.
Investigating this plot showed that Nigerian Student Kagglers were mostly in school for Bachelor degrees, let’s see if it is the same for the Rest of the World.

With the higher range of Masters degrees in the Rest of the World as seen earlier, they are dominant in all job positions except Unemployed, Software Engineer, and Student, which had more Bachelor degrees and the Research Scientist role, which had mostly (59%) Doctoral degrees.
Comparing both plots and their population, a significant percentage of the Research scientists have postgraduate degrees highlighting a higher educational or technical base for the role.
With a better picture of our mostly males below thirty, having a large percentage of Bachelor degree Students, Nigerian sample, we can go forward with understanding their interests in Data Science, what tools they have used, where they learn and consume data science content.
To start off, we explore their experience in coding or programming and machine learning.

As both samples have relatively younger individuals, as seen previously, it is not surprising that most of the Kagglers here have 5 or fewer years of experience in coding.
The Nigerian sample shows an almost linear progression descending from relative novices who started programming within the year to individuals with up to 10 or more years of experience, comparing the two plots, we see there are no Nigerian individuals with 20+ years of experience in the sample.

Machine learning is a significant part of data science with a wide range of real-world applications. For our survey sample, a larger percentage of participants only had less than a year’s experience in the topic.
While there are differences in percentage proportion, there were few differences in ranking as there was again, an almost linear progression starting with a large percentage of beginners that reduces as experience increases. Again, the Nigerian sample had no individual with 20+ experience.
Data Science Tools
There has been a lot of improvement in the data science, data analytics space, as demand for data-fueled decisions and innovations grew. This has driven the popularity and importance of many tools that help enrich the process. However, there is a lot of individual preference with companies also having standards for which tools to use.
Here, we look at some of those preferences.

Python’s popularity is still ongoing with SQL coming a far second, both programming languages are data staples used by most data-related roles. The JavaScript ranking might point to some web experience in our participants especially the Software Engineers. Our Rest of the World sample has a lesser proportion of JavaScript users with the top three being Python, SQL, and R.

Scikit-learn is our most popular framework in both samples. The machine learning library has been a staple in data science since its initial release in 2007 with its beginner-friendly methodologies and its wide range of functions for different use cases. Google’s Tensor Flow comes second for both samples. Comparing both plots, our first deviation is the ranking of Keras, which is often used as a companion to Tensor Flow for deep learning. Facebook’s PyTorch doesn’t rank top 5 in the Nigerian sample.

Although, there are software packages that boast predictive modeling capabilities, for data science roles, programming in most languages offers more flexibility in the modeling process. With this in mind, it is not surprising to see that programming development applications were the most popular tool for analysis in both samples. Basic statistical packages like Excel, are often an individual’s first impression of "data", the tools are still the first contact for structured data in many companies, there have been numerous improvements of the tool as the data space grew, allowing it to keep its competitive edge.
Cloud-based software still has some improvements to make to be palatable to the general public, especially in Nigeria. Although, as the main draw is the easier management of big data, they fulfill a need that has just not been generally realized.
Data Science Learning & Content Consumption
Here, we investigate how our participants learn and get updates on the data science space.

When looking at the learning platforms where data science learning took place, we see that Coursera had a larger percentage than all comers in both samples, perhaps a nod to their large library of courses on diverse topics and the ability to "audit" most courses for free. Udemy and Kaggle learn courses came next with different orders in the two samples.
Comparing both plots, the presence of University courses in the top 5 for the Rest of the World sample might point to the availability of graduate studies targeted at data science whereas a Nigerian degree could only be adjacent with Statistics, Mathematics or Computer Science courses.

Our top 4 ranking is the same in both samples with similar percentages highlighting the prominence of these platforms in data science related content; Kaggle, Youtube, Blogs, and Twitter.
The Nigerian participants found journal publications on data science less useful than their peers, preferring Newsletters, Slack communities, and Course forums over it.

The data science space can be a vocal one, with numerous "influencers" but it also has a lot of people sharing what they have done or learnt daily, here we look at the popular platforms to share work publicly to be viewed by others.
GitHub and Kaggle come out Top 3 in both samples, GitHub’s prominence in the programming/developer scene is obvious to anyone in the space, as even without the excellent version control capabilities allowing for efficient teamwork, it is an excellent place to build a portfolio with personal or open-source projects.
Kaggle is also a great place to showcase data science talents as public competitions or just unique datasets drive a large population looking to learn, compete, or build their data science muscles. There were a substantial amount of people especially in the "Rest of the World" sample that don’t share that work publicly either due to work restrictions, personal preference, or other concerns.
It is worthy to note that more Nigerian participants chose Streamlit over Shiny and Plotly Dash for their web application, dashboarding needs, a decision that is reversed in the Rest of the World sample. Also, no Nigerian participant has used NBViewer, which is a tool for rendering Jupyter notebooks (from GitHub or elsewhere) as a webpage for easier sharing or embedding, a use case which is a bit niche with other alternatives as seen from its low percentage in the Rest of the World.
Company
Here, we look at the structure of companies our participants worked in (this section is filtered for those with company jobs) and how they are using machine learning and data science as a whole.

Most of our participants, especially in the Nigerian segment, work in companies with less than 50 employees.
Comparing plots, we see the second largest proportion in the Rest of the World sample is the highest option of 10,000 or more employees. This either posits that such a company is more common than expected or the demographics of this sample are largely represented influencing the proportion.

The top two entries for the size of the companies’ data science team gives us a view into how far companies take their utilization of Data Science. A high percentage of the companies represented here have less than 3 people on their data science team and the next percentage ranking goes to companies with no dedicated data science team at all.
This means the company does not use machine learning or data science at all or any data science tasks are covered or taken by company members who are not solely dedicated to its execution.
We see more on that in the next plot.

More Nigerian businesses don’t use machine learning methods when looking at proportion. Comparing plots, the two samples have "exploring machine learning methods" or "not using them at all" as their top entries to the question "Does your employer incorporate machine learning methods into their business?".
Very few companies have a model in production already in the Nigerian space highlighting the under-utilization of machine learning methods.

Both samples have a similar percentage of people who do not pay or have never paid for data science (machine learning or cloud services). The Nigerian plot has a linear descent with most people at zero dollars, we have seen this relationship in the Nigerian sample at least twice, both times relating to the experience of the participants, so we can tentatively posit this question is related. With more people on the beginner side of things, there is room for improvement.
To round up our analysis, we take a look at some data science related tasks checking what proportion of people find it important in their day to day work.

Most people in both samples perform analysis on data to influence business decisions. In ranking tasks, Nigerians found building data infrastructure more important than any machine learning tasks. Comparing both plots, building prototypes for new applications of machine learning is near the bottom of the ranking for Nigerians while it is second on the list for the Rest of the World.
There is also a higher percentage of people saying the listed tasks are not an important part of their job role in the Nigerian sample, in order to explore this assertion, there is a need to connect our demographics data, most especially the data on the occupations of the participants.

The analyzing task can be seen to be important to most data-related job roles. The product/project manager job has the highest need for other data-related job tasks. Statisticians and other unlisted data roles are some of the highest percentages of individuals who indicated that the listed data related tasks were not important for their work, the software engineers are another noteworthy mention.

The data engineers and DBA/database engineers in this sample spend substantially more time building data infrastructure as an important part of their work.
Comparing both plots, there is a significantly less proportion of Statisticians who believe the listed tasks are not important in their work, with most participants in this sample analyzing to influence business decisions, we can posit there is less reasons for the surveyed Nigerian statisticians to work on influencing business decisions, perhaps due an employment in the educational or research-based sectors.
Conclusion
Nigeria has a large population, while this is often a weakness, it can also be a strength, with the inspiration of those that have come before, the support of the initiatives helping young and old individuals upskill with the numerous data science adjacent data analysts, business analyst, and mathematicians. The numbers will continue to peak on the Kaggle platform as the demand for knowledge and experience increases.
For the under-utilization of data science methods, ‘the floor is rising’, with more open vacancies for data science roles comes an understanding of the impact they bring to the system and vice versa, as it is a cycle that will lead to the adoption of data science, its tools, and principles.
References
- Google Trends for Data Science in Nigeria, 2015–2020. (2020). Retrieved from https://trends.google.com/trends/explore?date=today%205-y&geo=NG&q=%2Fm%2F0jt3_q3
- CIA Factbook on Nigeria. (2020). Retrieved from https://www.cia.gov/the-world-factbook/countries/nigeria/
- Google job search for Nigerian Data scientist jobs. (2021). Retrieved from website. Screenshot by author.
- Indeed search for Data scientist jobs. (2021). Retrieved from https://www.indeed.com/jobs?q=Data%20Scientist&vjk=d2b1afb5d21692a2. Screenshot by author.
- Quora question on Nigerian immigrants. (2020). Retrieved from https://www.quora.com/Why-are-Nigerians-the-most-educated-group-in-America.
Thanks for reading!