The world’s leading publication for data science, AI, and ML professionals.

Statistical Indexes for Measuring Economic Inequality

How to calculate inequality indexes using Python

Photo by Vinay Darekar on Unsplash
Photo by Vinay Darekar on Unsplash

1. Introduction

Inequality is one of the topics evoking immense interest from economists. It essentially concerns with the distribution of benefits of economic growth. High inequality is often said to be self-reinforcing. The rich, having more political power, use it to promote own interests and entrench their relative position in the society. The negative effects due to exclusion and lack of equal opportunity are made permanent thereby perpetuating even greater inequality. It can give rise to social unrest threatening the smooth functioning of political economy. Therefore, measuring inequality is of utmost importance in welfare Economics. Economists have used several indexes to study the connection between growth and inequality. Some common statistical measures include coefficient of variation, Lorenz curve, Gini coefficient and Theils index. Each measure has its own merits and shortcomings. Despite being very common indicators in economics, there is lack of adequate resources illustrating their application using general programming languages such as R and Python. This piece is a modest attempt to fill this gap by discussing their implementation in Python alongside a practical use case.

2. Data Collection and Processing

We will work with per capita GDP (constant 2017 PPP$) from World Development Indicator database of World Bank. This data and a host of other indicators are available easily through their World Bank’s Python API – WBGAPI. For more on this, I encourage you to checkout the documentation available on PyPI (https://pypi.org/project/wbgapi/).

The processed dataframe looks as shown below:

Processed Dataframe: Image by Author
Processed Dataframe: Image by Author

3. Calculation of Statistical Indexes

3.1. Coefficient of Variation

The coefficient of variation (CV) is a statistical measure of the relative dispersion of data points in a data series around the mean. In plain English, the coefficient of variation is simply the ratio between the standard deviation and the mean.

The Scipy library provides a class for easy implementation of this measure.

from scipy import stats
cv = pcgdp_wide.apply(stats.variation, axis = 0)

3.2. Lorenz Curve

Lorenz Curve was introduced by Max Lorenz in 1905. We can construct the Lorenz curve for any given year by computing the proportion of countries that are below each observation when sorted in the order of their per capita GDP. For that, we generate a cumulative series.

  • First, we sort countries by their per capita GDP in ascending order.
  • Second, we compute the percentage of countries that are below each observation.
  • Third, we find out the income of each country as percentage of total income all countries put together.
  • Fourth, we construct the cumulative sum of income percentages.
  • Fifth, we can plot the Lorenz curve by plotting the cumulative percentage of countries and cumulative sum of income percentages obtained in steps 2 and 4.

In the code block below, we will generate the Lorenz curve for all the years beginning 2000 to 2020 and combine them in a gif file using screentogif utility.

Animation showing Lorenz Curve for years 2000–2019: Image by Author
Animation showing Lorenz Curve for years 2000–2019: Image by Author

3.3. Gini Coefficient

The Gini coefficient was developed by the Italian statistician Corrado Gini in 1912 as an aggregate measure of income Inequality. It is represented by the area between the Lorenz curve and line of perfect equality (45-degree line). Although frequently used , Gini is criticised for its large scale aggregate nature, rendering it weak in estimating inequality at various levels of income distribution.

Image by BenFrantzDale on Wikipedia
Image by BenFrantzDale on Wikipedia

Mathematically, Gini is defined as the mean of absolute differences between all pairs of individuals for any measure e.g. income. The minimum value is 0 when all measurements are equal and the theoretical maximum is 1 for an infinitely large set of observations where all measurements except one has a value of 0, which is the ultimate inequality.

As much frightening this formula is, its Python implementation is equally a cakewalk. We can use either pysal or quantecon libraries to calculate the Gini coefficient.

# Method 1: Using pysal library
from pysal.explore import inequality
gini = pcgdp_wide.apply(inequality.gini.Gini, axis = 0)
# Method 2: Using quantecon library
import quantecon as qe
gini = pcgdp_wide.apply(lambda x: qe.gini_coefficient(x.values), axis = 0)

3.4. Theils Index

Theils index was proposed by a Dutch econometrician Henri Theil from Erasmus University Amsterdam. In plain English, it is the average of logarithms of income shares weighted by income shares.

Conceptually, you may find that this metric resembles a term called entropy in high school Physics/Chemistry. That’s correct. It is nothing but the entropy of income distribution measuring how evenly-distributed incomes are across the population. The Theils index is available in PySAL’s inequality module.

# Using pysal
from pysal.explore import inequality
theil = pcgdp_wide.apply(lambda x: inequality.theil.Theil(x).T, axis=0)

4. Results

We put together all the three inequality indexes together in a dataframe to observe their variation over the 20 year timespan using the code snippet below.

df = pd.DataFrame({'year':range(2000, 2020,1),
              'CV': cv,
             'gini': gini,
             'theil': theil})
df.set_index('year').plot();
plt.xticks(range(2000, 2020,2))
plt.show()
Variation of per capita GDP based global inequality indexes across time: Image by Author
Variation of per capita GDP based global inequality indexes across time: Image by Author

We notice that all indexes broadly capture the same trend with decreasing inequality in per capita incomes of countries. This indicates that the gap between rich and poor countries is coming down as far as per capita GDP is concerned. As an exercise, try exploring the inequality trends by dividing the countries into four groups – high income, upper middle income, lower middle income and low income. You can use the World Bank’s country classification as given on this page. In conclusion, we discussed some commonly used statistical indexes for measuring inequality viz. CV, Lorenz curve, Gini and Theils; and their implementation using Python. There are few libraries which make it easy to work out the values of these indices by invoking just a one-line function.

Before we wind up,

I invite you to join me in this exciting data science odyssey. Follow my medium page to explore more exciting content about data science.


Disclaimer: Views are personal.

References

  1. https://python.quantecon.org/wealth_dynamics.html
  2. https://towardsdatascience.com/measuring-statistical-dispersion-with-the-gini-coefficient-22e4e63463af
  3. https://geographicdata.science/book/notebooks/09_spatial_inequality.html
  4. https://www.statsdirect.com/help/nonparametric_methods/gini_coefficient.htm
  5. https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data
  6. https://pypi.org/project/wbgapi/
  7. https://voxeu.org/content/why-inequality-matters

Related Articles