Education as the driver of human development in Brazil

An analysis of Brazilian census data

Fernando Barbalho
Towards Data Science

--

Image: Freepik

The last census in 2010 updated the data that measured the human development index of the municipalities (MHDI) in Brazil. The country should have conducted the subsequent census in 2020, but mainly because of the pandemic, data collection was delayed by two years. The effects of the pandemic and economic crises raise concerns about the improvement observed in the 2000 and 2010 censuses. While it is not possible to make comparisons with the country's current state, how about seeking to understand in more detail how the past evolution was from the coldness of the numbers and the visual sensations of the graphs?

The following paragraphs, and especially the next images, bring a small summary of the changes between 1991 and 2010. Here I make comparisons first between 1991 and 2000, and then between 2000 and 2010. The idea is to show through clustering techniques what are the three groups of municipalities that are formed from the comparisons between variations in the MHDI, and then show from the changes in the various variables that make up the MHDI what are the main factors that influenced the framing of the municipalities in these three groups.

Comparison between 1991 and 2000

The figure below shows how the algorithm distributed Brazilian municipalities among three classes of variation in MHDI between 1991 and 2000.

Classes of MHDI variation. Image by the author.

In the graph, each dot is a municipality. In the low variation class, in purple, are 2167 municipalities that have seen their MHDI vary between 5.9% and 34.8%. The medium variation class, in light green, concentrates 2107 localities that saw their MHDI vary between 34.81% and 53.7%, and in the last category, painted in yellow, such variations are in the range between 53.71% and 207% for the remaining 991 municipalities.

Below, we see the municipalities' distribution by the three classes of variation in MHDI.

Maps and the HMDI variation between 1991 and 2000. Image by the author.

The legend applied to the maps indicates that the bluer the color of the municipality's point, the higher the MHDI, and the redder, the lower the value. The change in the color range occurs in the 0.5 measure that corresponds to the values where we start considering HDIs of medium development.

The observation of the maps shows that the municipalities with low variation in the MHDI and, at the same time, with the highest MHDI values are concentrated in the South and Southeast regions. The cities with high variations, on the other hand, were associated in 1991 with very low MHDI. In 2000, in general, an improvement was observed in this class, with some cities even surpassing the medium development HDI point (HDI>5). In this way, it is possible to see some towns with blue shades, which did not occur in 1991.

The MHDI is formed by the synthesis of more than 200 variables. Some may have influenced more, others less, the evolution of the indicators for each of the municipalities. Therefore, using a procedure known as a decision tree, I tried to identify the most relevant characteristics to determine the cities in the three groups shown above.

The algorithm indicated eight important variables, all of them related to education. The first one is the school attendance rate of the young population. According to the methodology used, this variable is 2.28 times more important than the second most important, the percentage of the population between 5 and 6 years old attending school. Let's go to the graphs.

School Attendance of the Young Population

Four indicators are used to calculate the school attendance of the young population: percentages of 5 to 6-year-olds attending school, of 11 to 13-year-olds following the final years of elementary school, of 15 to 17-year-olds with complete elementary school, and of 18 to 20-year-olds with entire high school. See below how this indicator divides the three classes of variations in the MHDI between 1991 and 2000.

Influence of school attendance in MHDI variation between 1991 and 2000. Image by the author

In the graph above, we can see that in the reference year, 1991, the three colors of the three groups are well consolidated in their positions. The municipalities that moved the least in their MHDI, marked by purple dots, had a much higher school attendance rate than the other municipalities in the other two classes. And more relevant, the municipalities with the most significant movement, painted in yellow, occupied spaces marked by low MHDI and low school attendance.

On the other hand, when we see what happens in 2000, the yellow and light green dots get closer to the purple dots, thus demonstrating how much the municipalities in these groups have developed about the school attendance index and how much this development has impacted the improvement of the MHDI.

Another way to see this movement is through the figure below.

Influence of school attendance in MHDI variation between 1991 and 2000. Image by author

One can observe in 1991, the yellow dots concentrated in a quadrant characterized by very low MHDI and meager school attendance. In practice, almost no municipality in this class reaches an MHDI higher than 0.4 and school attendance higher than 0.25. In 2000, on the other hand, the yellow dots quickly came the quadrant marked by school attendance higher than 0.5 and MHDI higher than 0.4. It is also worth noting that in no class are their representatives in a quadrant marked by high school attendance and low MHDI, which reinforces the importance of this component in the overall improvement of the MHDI.

Population 5 to 6-year-old attending school

As said before, the measurement of the percentage of children between 5 and 6 years of age attending school is part of the school attendance indicator for young people. According to our algorithm, this is the second most crucial variable in forming the groups by the variation of the MHDI. So, let's look at a figure associated with this indicator.

Influence of 5–6 years old students in MHDI variation between 1991 and 2000. Image by author

It is easy to see a displacement of the municipalities to the right, especially for the cities with medium and high variations in their MHDI. However, it is also observed that this displacement did not have as strong an impact on the improvement of the MHDI as followed in the youth population frequency index. One sees for all groups the presence of municipalities in the quadrant of medium and high school attendance rates combined with medium and low MHDI.

Comparison between 2000 and 2010

Ten more years of civilizational process improvements have brought even more blue dots and fewer attention points in the MHDIs. But, more importantly, there is a regional emphasis among the groups formed by the variation in MHDI between 2000 and 2010.

Maps and the HMDI variation between 2000 and 2010. Image by the author.

The graph above shows practically no MHDI points classified as low development in 2010. The South and Southeast regions concentrated on municipalities with more minor variation between the two censuses. Most probably because they are the regions where there were already medium and high MHDI more frequently, and therefore there would not be greater spaces for growth. On the other hand, the Northeast region concentrates the municipalities with the highest variation in MHDI.

This transformation in approximately 20 years pushes us to compare 1991 with 2010 to make the contrast between colors even stronger. And we will make this comparison because we have the data.

Maps and the HMDI variation between 1991 and 2010. Image by the author.

But what about the important variables for the change between 2000 and 2010? Again all the important factors are associated with education. And again, the school attendance of the young population is the most important, followed now by the rate of students between 11 and 13-year-old in elementary school. So let's go to the graphs.

School Attendance of the Young — Part II

See below what the dots and colors tell us.

Influence of school attendance in MHDI variation between 2000 and 2010. Image by the author

As expected, the starting point for all classes is much higher than the previous decade's comparison. Once again, we see the visual highlight of the high correlation between school attendance and MHDI for all variation classes.

Rate of pupils between 11 and 13 years of age in elementary school

Again, an indicator that makes up the school attendance rate of young people is in second place in the definition of the HDI variation groups. Let's take a look at the graph.

Influence of 11–13 years old students in MHDI variation between 2000 and 2010. Image by the author

The low and medium variation classes are compressed when comparing 2000 with 2010, and for the medium variation municipalities, an upward displacement of the MHDI can be clearly seen. On the other hand, for the high variation municipalities, there is as much displacement of the points to the right as to the top. This shows the importance of children at the correct school age in elementary education to improve the MHDI observed in 2010.

The algorithms used in this analysis helped uncover hidden patterns among the nearly 17000 rows and 230 columns that make up the MHDI database. Here we highlight only a subset of the most important variables for the text.

A pattern of high correlation between the presence of young people in school and the importance of consistency between age and school stage became clear. Also, in the comparison between 2000 and 2010, the strong regional component that characterizes the division between groups of municipalities based on changes in their HDI indices stands out.

In general, the advances were relevant in several other variables. A more attentive researcher might want to explain, for example, the strong reduction in infant mortality, improvement in income, and so on. There is still a universe of data to be explored. This moment that precedes the resumption of the census seems to be opportune to recover the advances that Brazil achieved until 2010 and contrast them with the expectations in a scenario of great instability.

If there is interest in putting hearts in minds on the MHDI data, they have been made available for easy consumption by the NGO Base dos Dados from this address. On the other hand, if you are interested in knowing how I used the algorithms or constructed the graphs, the codes in the R language are available on my GitHub.

--

--

Doctor in Business Administration from UNB (2014). As data scientist, researches and implements products for transparency in the Brazilian public sector.