The world’s leading publication for data science, AI, and ML professionals.

Exploring the demography and power sector of South Asia

Analysis using geopandas, matplotlib, networkx, and Plotly in Python

Suppose you are in a global conference with participants from all over the world for a particular cause. If you are going to meet and interact with a random individual at this conference, there is a high chance that the person is from South Asia. According to the statistics, nearly one in every four individuals in the world is from South Asia. The region comprises eight countries: Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan, and Sri Lanka. Together, these countries form a regional intergovernmental organisation called South Asian Association for Regional Cooperation (SAARC).

Coming from one of the countries in this region, I have been curious to analyse the regional demography for a while. Having a background in Energy systems, I am also interested to analyse how the power systems of the countries in the region look like and how they differ from each other. So this holiday season devoid of travelling due to Covid-related restrictions, I just sat down in front of my laptop, inspected the numbers from some open sources, and started to crunch them using my favourite programming language, Python.

Here’s how it went. I am going to describe the insights and the codes in parallel.


Demographics

I still recall the time two decades ago when I learnt in school that the population of the world was six billion. Fast forward to today, it is a matter of months or years, when the global population is going to hit eight billion figures.

Population of South Asian countries plotted as stacked bar plots for 1990 and 2019. The shaded region between the bars represent the population growth for each country between the two years. Data based on World Bank Open Data 2021.
Population of South Asian countries plotted as stacked bar plots for 1990 and 2019. The shaded region between the bars represent the population growth for each country between the two years. Data based on World Bank Open Data 2021.

Within South Asia, India accounted for three-quarters of the total population in 2020. This was followed by Pakistan and Bangladesh. The rest of the five countries in the region accounted for less than five percent of the total population combined. The population of South Asia has grown by 65% in the past three decades. The majority of this growth could be attributed to India, as it is reflected in the shaded region between the two stacked columns in the plot above. However, all the countries in the region have observed a steady increase in population in this timeframe.

I obtained the data for this analysis from World Bank Open Data and stored it in df_popdataframe.

Snippet of df_pop dataframe containing population data of eight countries in South Asia. Data based on World Bank Open Data 2021.
Snippet of df_pop dataframe containing population data of eight countries in South Asia. Data based on World Bank Open Data 2021.

The code to obtain the plot above is in the gist below. In this plot, the ticks for 1990 and 2020 represent the 0 and 1 positions in the x-axis respectively. Therefore, one needs to specify the exact positions for x, y1, and y2 to fill in the area between two stacked columns, as well as specify the corresponding colour to represent each country.

Population distribution by gender

Next, I wanted to check how the population is distributed across South Asia based on gender. I wanted to plot this as a bar plot over the map of the respective country in the region.

The initial step in this process was to get the geometry of the region using geopandas. An introduction and application of geopandas are provided in this story. First, I generated the world geodataframe by reading an inherent dataset of the package. Next, I took a subset ofworld called saarc, which comprised of geometries of only the eight South Asian countries. Being a small island nation located in the Indian ocean, the geometry of Maldives was unfortunately unavailable in this dataset.

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
saarc_countries = ["Nepal","India","Bangladesh","Bhutan","Afghanistan","Pakistan","Sri Lanka","Maldives"]
#select only the countries in the list from world
saarc = world[world.name.isin(saarc_countries)]

The gender-based population data for 2020, which was derived from World Bank Open Data in df, was appended with saarc based on having a common column with country names in both. It looked like follows:

saarc = saarc.join(df)
Snippet of saarc geopandas dataframe, where the male and female population shares were appended as columns on the right. Population data based on World Bank Open Data 2021.
Snippet of saarc geopandas dataframe, where the male and female population shares were appended as columns on the right. Population data based on World Bank Open Data 2021.

Next, I created inset_axes within parent_axes for individual countries by specifying the width and height as a percentage of parent_axes. For the location of the bounding box, I provided the centroid of the geometry of each country. Then, I plotted the juxtaposed bars of male and female population share within the inset_axes created for each country. The code is given below:

The resulting plot is as shown:

Male and female populations share plotted as bars on the map of respective country in South Asia. Data based on World Bank Open Data 2021.
Male and female populations share plotted as bars on the map of respective country in South Asia. Data based on World Bank Open Data 2021.

The resulting plot depicted the uneven population distribution by gender across South Asia. In 2020, the population share of females was higher than that of males for Nepal and Sri Lanka. For Bangladesh, it was almost 50:50. For Afghanistan, Bhutan, India, and Pakistan, the share of the male population was higher as compared to the female population.

Electricity generation mix in South Asia

My research interests lie in understanding the energy supply and demand structure of different countries, and more importantly, how the countries meet their electricity demand. The data for this analysis was sourced from the Our World in Data website.

Needless to say, the total electricity generation and the installed capacity of power plants of India belittles that of any other country in the region owing to its giant size and population. However, I wanted to rather compare how the individual electricity mix of the countries looked like. I utilised the same approach as previous one to construct the plot for this purpose.

Electricity generation mix plotted as pie-charts over the map of respective country in South Asia. Data based on Our World in Data 2021.
Electricity generation mix plotted as pie-charts over the map of respective country in South Asia. Data based on Our World in Data 2021.

The plot demonstrated that in 2020, Bhutan and Nepal relied almost completely on hydropower, a renewable source of electricity. These two countries have significant potential of generating hydroelectricity from the rivers that originate in the Himalayas. Other countries in the region have a relatively lower share of hydro in their electricity mix.

A striking fact that the plot above depicted was the heavy dependence of other countries on fossil fuels for electricity. For example, Bangladesh, the most densely populated country in the region%20completing%20the%20top%20five.), sourced almost all of its electricity from natural gas or oil. And India produced three-quarters of its electricity from coal, the most carbon-intensive fuel. In 2020, nuclear power was operational only in India and Pakistan.

With their geographical location, South Asian countries receive abundant solar irradiation throughout the year. Some locations also have good wind potential. To contribute to global climate action, these countries must tap into their renewable energy potential and avoid possible future carbon lock-ins.

Electricity access across South Asia

The countries in South Asia have significantly progressed in providing access to electricity to their population in the past few decades alone, thanks to the efforts of the government, private sector, utility companies, think-tanks, and international development cooperation.

To compare how the progress looked in reality, I dived deeper into the electricity access data from World Bank Open Data and combined it with the saarc geodataframe. Based on the data availability, I plotted the electricity access rate in the region for 2005 and 2019 using choropleth Maps in the subplot for each country.

Electricity access rate in South Asian countries in 2005 and 2019. The color bar on the right is common for both subplots. Data based on World Bank Open Data 2021.
Electricity access rate in South Asian countries in 2005 and 2019. The color bar on the right is common for both subplots. Data based on World Bank Open Data 2021.

The plot showed that the electricity access rate has improved significantly in South Asia between 2005 and 2019. Except for Pakistan, the electricity access rate has reached 90 percent or above in all countries. Bhutan and Sri Lanka have already reached a 100 percent electrification rate. In all the countries, the urban population has much higher access to electricity as compared to the rural population. Insufficient infrastructure development and technical capacity are the reasons for the poor development in the Power sector in certain countries.

Interconnectors

The synchronous grid of Continental Europe covers the territory of the European Network of Transmission System Operators (ENTSO-E), which connects part or all of the countries in Europe. This type of interconnection allows flexibility in the power system as low-cost electricity generated in one region with high supply could be transported to the other region with high demand. Not only does it help to maintain supply and demand balance in different spatial and temporal levels , but it also ensures the security of supply and helps reduce costs and emissions.

Currently, there are limited existing and proposed cross-border interconnections between countries in South Asia, such as Bhutan-India, India-Nepal, India-Bangladesh, etc. While the infrastructure needs to set up interconnectors come with its own set of costs and challenges, a study suggests that the high-benefit cost ratio can outweigh challenges, which can be instrumental in the long-term development in the region.

I envisioned a hypothetical interconnection between different countries in the region and tried to plot it on the map for demonstration purpose only. The hypothetical interconnectors data chosen arbitrarily by me is as follows:

Snippet of interconnectors dataframe containing hypothetical interconnection between power systems in South Asia. Based on the author's arbitrary selection (for demonstration purpose only).
Snippet of interconnectors dataframe containing hypothetical interconnection between power systems in South Asia. Based on the author’s arbitrary selection (for demonstration purpose only).

First, I converted it into a directed graph using the networkx package in Python:

G = nx.from_pandas_edgelist(interconnectors, source = "From", target = "To", create_using = nx.DiGraph())

Next, I created Basemap by specifying mercator projection (a cylindrical map projection) and low resolution. I specified longitudes and latitudes to incorporate all countries in the basemap.

from mpl_toolkits.basemap import Basemap
m = Basemap(projection = "merc", #Mercator projection
 llcrnrlon = 60, #longitude of lower left corner
 llcrnrlat = 5, #latitude of lower left corner
 urcrnrlon = 100, #longitude of upper right corner
 urcrnrlat = 40, #latitude of upper right corner
 lat_ts = 1, #latitude of true scale
 resolution = "l",
 suppress_ticks = False)

I wanted to plot the nodes for each country in the respective geometric centroid of the country. Therefore, I created a dictionary of positions including the position of nodes in proportion to the size of the basemap.

positions = {}
for index, row in saarc.iterrows():
      #Set positions on the Basemap in proportion to the size of Basemap
      positions[index] = m(row.geometry.centroid.x, row.geometry.centroid.y)

I got the following positiondictionary:

{'India': (2178387.5972047374, 2063020.0475059198),
 'Bangladesh': (3365125.8765982683, 2173785.0707132816),
 'Bhutan': (3387861.379178301, 2616406.810286945),
 'Nepal': (2669735.152648369, 2718429.0071621696),
 'Pakistan': (1046628.9002186056, 2939091.5173554155),
 'Afghanistan': (676705.6727856797, 3447841.160892035),
 'Sri Lanka': (2297740.670615352, 302122.0644451928)}

Finally, I plotted the directed graph by specifying the node positions, node sizes, and other parameters. The width of the edges between countries reflect the hypothetical capacity I provided. Next, I also drew the country boundaries and coastlines and filled continent as shown in the code below.

nx.draw_networkx(G, pos = positions,
 node_shape = "o",
 node_color = "r",
 alpha = 0.8,
 node_size = 100,
 arrows = True,
 width = [5, 5, 5, 10, 20, 20, 5, 5],
 edge_color = "blue")
m.drawcountries(linewidth = 0.5)
#m.drawstates(linewidth = 1)
m.drawcoastlines(linewidth = 1)
m.fillcontinents(alpha = 0.5)
plt.title("Power system interconnections $(itHypothetical)$")
plt.show()

Finally, I got the plot of hypothetical interconnection between countries, which I envisioned before.

Hypothetical interconnection between power systems in countries in South Asia. Red dot represents the nodes, and the blue line represents the edges. The width of edge reflects the capacity of interconnector between two countries. Based on the author's arbitrary selection (for demonstration purpose only).
Hypothetical interconnection between power systems in countries in South Asia. Red dot represents the nodes, and the blue line represents the edges. The width of edge reflects the capacity of interconnector between two countries. Based on the author’s arbitrary selection (for demonstration purpose only).

Conclusion

As the world is traversing through the phase of the global energy transition, the role of research, analysis, modelling, and planning is more important than ever. Analyses of the population demographics and infrastructure (e.g. power system as depicted in this story) provide insights, which are important for identifying development needs, policy recommendations, and investment gaps. These data and insights are vital to model the development pathway of a country or region in any sector.

Visualisation of demography and power system analysed in this story. Image by Author.
Visualisation of demography and power system analysed in this story. Image by Author.

The analyses of population demographics and power systems of South Asian countries are provided in this story. This has been done using packages such as geopandas, matplotlib, networkx, and pandas in Python. The notebook for this analysis is available in this GitHub repository.


Related Articles