The world’s leading publication for data science, AI, and ML professionals.

Which Database Environments are of interest to developers in 2021?

Introduction

An analysis of the expected usage of database environments by developers in 2021

Word Cloud of DBMS Usage Count in 2021 - Image by Author
Word Cloud of DBMS Usage Count in 2021 – Image by Author

In recent years, the world has seen a rise in the usage of artificial intelligence both in industry and for research. Undoubtedly, the increased relevance for artificial intelligence has resulted in a demand for experts from diverse areas of artificial intelligence applicability. These expert demand areas include Software Developers, Machine Learning Engineers, Data Scientist, and Database Administrators. Further, the mentioned expert fields of work demand technical know-how in areas like computer programming, database management systems, statistics, and matrix algebra, which is imperative. In this article, I focus only on the usage of database management systems. More precisely, I make known the database environments expected to be popular among software developers in 2021.

In revealing how database environments are expected to be popular, I use the Stack Overflow Annual Developer Survey datasets for the year 2020. My reason for using these datasets is because, over the past decade, the Stack Overflow Annual Developer Survey has emerged as the world largest and most trusted community of professional software developers. Thus, it will be necessary to know what these software developers are saying concerning the database environments expected to be popular among software developers in 2021.

The conducted survey had 64,461 respondents from 182 countries and territories. The survey questions were grouped into the following sections: Basic Information, Education, Work and Career, Tech and tech culture, Stack Overflow Usage and Community and Demographics. There were 61 questions posed in the survey. Some of the questions include:

Where do you live?

Which collaboration tools have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you worked with the tool and want to continue to do so, please check both boxes in that row.)

But the question I am most interested in is

Which database environments have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the database and want to continue to do so, please check both boxes in that row.)

Distribution of the survey respondents according to countries and continents

Before answering the question of interest, I want to understand

How does the distribution of the respondents in the survey correspond to their countries (resp. continents) and whether the distribution well represents the population of the countries (resp. continents)?

In the first subsection (Distribution of the survey respondents according to countries), I present my findings based on the countries of the survey respondents. The findings that correspond to the continents where the respondents are based are given in the second subsection (Distribution of the survey respondents according to countries).

Distribution of the survey respondents according to countries

In this subsection, I will present my findings to the question below.

How does the distribution of the survey respondents correspond to their countries?

The part of the dataset used to answer this question was the Country column which asked respondents where they were living?

Figure 1 below shows the percentage of the survey respondents based on their country of abode in descending order. One can see that the United States records 19.46% as the highest number of survey respondents. The next four countries in descending order are India, the United Kingdom, Germany and Canada recording, 13.11%, 6.08 %, 6.07%, and 3.42%, respectively.

Figure 1: Survey Respondent Counts and Percentage According to Countries (Only countries with at least 1% of respondents are shown) - Image by Author
Figure 1: Survey Respondent Counts and Percentage According to Countries (Only countries with at least 1% of respondents are shown) – Image by Author

A follow-up question is posed to find out how the distribution of the survey respondents corresponds to their countries of dwelling. This is by knowing

How well does the countries with the highest number of respondents represent their respective country population?

To answer the above question, I introduce a new quantity called Respondent Density which measures the density of the respondents given the population of their respective countries. Precisely, it is given by the formula below:

Figure 2: Respondent Density Formula - Image by Author
Figure 2: Respondent Density Formula – Image by Author

I use the 2020 population dataset provided by Tanu N Prabhu to get the population of the countries represented in the Stack Overflow Annual Survey dataset.

The visualisation of the results of my analysis is presented in Figure 3 below. From it, one can see that Sweden, the Netherlands, Israel, Canada, and the United Kingdom are interestingly the top five countries based on the respondents’ density measure. Thus, one can say that the number of survey respondents from these countries represents their country population well.

Figure 3: Survey Respondents in Percentage and Respondent Density of Countries of Respondents (Only countries with at least 1% of respondents are shown) - Image by Author
Figure 3: Survey Respondents in Percentage and Respondent Density of Countries of Respondents (Only countries with at least 1% of respondents are shown) – Image by Author

On the other hand, the United States, although recorded 19.46% of the survey respondents, ranked in the 8th position by the respondents’ density measure. India which recorded 13.11% of the survey respondents ranked in the 16th position by the respondents’ density measure. In these two countries, one can assume that the software developers there did not show much interest in the survey. One could also say that from the result represented for these two countries, there were not many software developers to give a good significant representation of their respective countries in this survey. Although these could be intelligent guesses, it will be of interest to know a concise reason for the result I have. Probably in conducting further research to have a concrete answer as to why this is the obvious case I have would be the way out.

Distribution of the survey respondents according to continents

In this subsection, I present the findings of my analysis to a possible follow-up question discussed in the previous subsection. Namely,

Which continent has the highest number of respondents?

I used the country continent dataset created by Chaitanya Gokhale to group the countries of the survey respondents into their respective continents.

The results of the findings are given in Figure 4 below.

Figure 4: Survey Respondent Counts and Percentages According to Continents - Image by Author
Figure 4: Survey Respondent Counts and Percentages According to Continents – Image by Author

From Figure 4 above, one can see that the continent with the highest number of respondents of the survey is the European continent with 38.5%, followed by the Americas, Asia, Africa, and Oceania recording 29.08%, 25.59%, 4.21% and 2.44%, respectively.

In the same vein as the second question posed in the previous subsection, I introduce the Respondent Density measure, restricting it to the population of the various continents of the survey respondents. The question of interest here is as follows:

How well does the results of the previous question represent the continent of the survey respondents?

From Figure 5 below, one can see that the Oceania continent has the highest respondent density. Thus, the respondents from this continent expressed more interest in the survey than those from the other surveyed continents.

Figure 5: Survey Respondent Percentage and Respondent Density of Continents - Image by Author
Figure 5: Survey Respondent Percentage and Respondent Density of Continents – Image by Author

Survey Respondents Usage of Database Environments

In this section, I present my findings for the main questions of interest below:

Which database environments have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the database and want to continue to do so, please check both boxes in that row.)

In the survey dataset, the same question is posed for the columns labelled DatabaseWorkedWith and DatabaseDesireNextYear. The responses of the survey respondents show that on a worldwide scale, there are 14 different database management systems (DBMS) popular among software developers. These include:

Cassandra Couchbase DynamoDB Elasticsearch Firebase IBM DB2 MariaDB Microsoft SQL Server MongoDB MySQL Oracle PostgreSQL Redis SQLite

The rest of this section is put into two subsections. In the first subsection (Survey Respondents Usage of Database Environments in 2020), I present my findings on the question of interest up until the year 2020, and the second subsection focuses on findings for the year 2021.

Survey Respondents Usage of Database Environments in 2020

In this section, I present my findings of the question of interest for the column labelled DatabaseWorkedWith. In this column, the survey respondents were required to indicate the database environments they have done extensive development work in up until the year 2020. In other words, I am interested in the following question.

What are the database environments that is most popular among the survey respondents?

Figure 6 below shows the usage of the various DBMS among all the survey respondents up until the end of the year 2020.

Figure 6: Survey Respondents Percentage Usage of Database Environments up until the end of 2020 - Image by Author
Figure 6: Survey Respondents Percentage Usage of Database Environments up until the end of 2020 – Image by Author

From Figure 6 above, one can see that the top five popular database environments that the respondents have done extensive development work in over the past year, i.e., until the end of the year 2020, are

MySQL – 20.07% PostgreSQL – 13.03% ** Microsoft SQL Server – 11.90%SQLite – 11.24% MongoDB** – 9.53%

It is interesting to know that although Mongodb is among the new creations to the database community in recent times as compared to other database environments like Oracle DBMS, it has risen to the top five DBMS among the survey respondents. In the next subsection, I will explore to know how popular MongoDB is likely to be in 2021.

Further in Figure 7 below, one can see how these top five DBMS are ranked in the most represented countries.

Figure 7: Percentage Usage of the Top Five DBMS in the Top Countries in 2020 - Image by Author
Figure 7: Percentage Usage of the Top Five DBMS in the Top Countries in 2020 – Image by Author

From Figure 7 above, MySQL continues to be the most extensively used database environments in the top countries, except for the Russian Federation, where Postgresql is ranked first and its competitor, MySQL is ranked second. One can also see that MySQL is used extensively in Pakistan, India and Italy.

In all these countries, there is competition among at least two of the top five DBMS. Poland is the only country where MySQL has a competitor. Its competitor is PostgreSQL. In the United States, India, the Netherlands, Australia and Italy there is a competition between the usage of PostgreSQL and Microsoft SQL Server. MongoDB is ranked second in Israel and competes with SQLite in Canada and France.

Survey Respondents Usage of Database Environments in 2021

Finally, I present my findings based on the answers survey respondents gave in the column labelled DatabaseDesireNextYear of the dataset. The question of interest here was

Which database environments would respondents want to be working in over the next year, i.e., in the year 2021?

Figure 8 below shows the expected ranking of the usage of DBMS in 2021.

Figure 8: Survey Respondents Percentage Usage of Database Environments up until the end of 2021 - Image by Author
Figure 8: Survey Respondents Percentage Usage of Database Environments up until the end of 2021 – Image by Author

From Figure 8 above, the top five database environments that are expected to be popular among software developers in 2021 are:

PostgreSQL – 14.30% MongoDB – 12.96% ** MySQL – 12.73% Redis – 9.69% SQLite** – 8.83%

Comparing the results for 2021 as seen here with what was discovered in the previous subsection which showed the year 2020 figures, one can see that PostgreSQL has a high likelihood of being popular in 2021 among respondents from all over the world. This conclusion is made for reason that, its usage is expected to increase by 1.27%. Thus moving upwards to take first place with a ranking of 14.30% from its year 2020 second rank after recording 13.03%.

The next is MongoDB which makes an impressive move upwards from the fifth position in the year 2020 to second place in 2021 by a margin of 3.43%. This is quite good progress to be made. A possible question to explore could also be whether MongoDB will be the most popular database environment among respondents worldwide in the coming years considering its mark of three places up in just a year gap, that is for the years 2020 and 2021.

MySQL decreased by a margin of 7.36% from the first position in 2020 to the third position in 2021. Thus, on a worldwide scale, there is a high likelihood that MySQL will be less popular among the respondents in 2021.

The next is Redis which ranked sixth among the respondents in 2020. It increased by a margin of 3.09% to be ranked as the fourth DBMS expected to be popular in 2021 among software developers on a worldwide scale. It falls in second place after MongoBD with quite a huge jump expected to gain popularity in 2021 as well.

Finally, SQLite decreased by a margin of 2.41%, falling from the fourth position in the year 2020 to the fifth position in the year 2021. It is also seen as the DBMS losing popularity on a worldwide scale among the survey respondents.

It becomes evident from the results that on a worldwide scale, Microsoft SQL Server has a high likelihood of not being popular among the survey respondents. I make this conclusion for the reason that Microsoft SQL Server falls **** from the third position in 2020 to the seventh position in 2021 by a margin of 3.91%.

Figure 9 shows the findings to the same question restricting to the countries that were most represented in the survey.

Figure 9: Percentage Usage of the Top Five DBMS in the Top Countries in 2021 - Image by Author
Figure 9: Percentage Usage of the Top Five DBMS in the Top Countries in 2021 – Image by Author

Comparing Figure 9 above with that of Figure 7, one can observe that PostgreSQL is still going to be popular in the year 2021 in the Russian Federation. The popularity of PostgreSQL can also be predicted for the United States, the United Kingdom, Germany, Canada, France, Brazil, the Netherlands, Poland, Australia, Sweden and Turkey. That is, there is a high likelihood that PostgreSQL will be popular among the respondents from these countries in the year 2021.

MongoDB is expected to be the most popular DBMS in the year 2021 among the respondents in India, Spain, Pakistan, and Israel. In Turkey, Brazil, Spain and Italy, MongoBD competes with PostgreSQL. An exception is made in India and Pakistan where one can see that MongoDB is rather in competition with MySQL.

Interestingly, Italy is the only country where MySQL is expected to maintain its popularity in 2021. Comparing the differences between MySQL and PostgreSQL, and MySQL and MongoDB for the years 2020 and 2021, one can say that, in Italy, it possible that the close competitors of MySQL, namely PostgreSQL and MongoDB might take over in 2021.

Conclusion

Most of the respondents of the survey lived in one of the following countries: the United States, India, United Kingdom, Germany and Canada.

Using the respondents’ density measure, I observed that respondents from the following countries: Sweden, the Netherlands, Israel, Canada and the United Kingdom were most interested in the survey. That is, they represent their population fairly well. Although the United States and India have more respondents, the value for their respondents’ density indicated that they did not represent their population too well as expected.

On the continental level, Europe emerged as the continent with the highest number of respondents for the survey. On the other hand, Oceania was the only continent that was well represented based on the respondent density measure.

On a worldwide scale, MySQL stood out as the most popular database environment among the respondents up until the end of the year 2020. In spite of this record for the year 2020, there is a high likelihood that PostgreSQL will overtake MySQL in the year 2021. ** Also, there is a predictive rise in interest for MongoDB and Redis expected in the year 2021. The reason for the interest was not explored in this study. Nevertheless, it will be interesting to have knowledge of this cause. My result also reveals that Microsoft SQL Serve**r is expected to be less popular among the survey respondents in 2021.

It is important to note that the findings in this study are observational and not thorough formal studies. Nevertheless, they lead to interesting business-oriented research questions which are directed not only to the software developers but also to companies who are producers of these database management systems. No matter the category you find yourself in, being either a software developer or a producer of a database management system, the question still remains:

What database management system do YOU expect to be popular in the year 2021?

For the technical details about these analyses and more, see the link to my Github available here.


Related Articles