
1. Introduction
Inequality is one of the topics evoking immense interest from economists. It essentially concerns with the distribution of benefits of economic growth. High inequality is often said to be self-reinforcing. The rich, having more political power, use it to promote own interests and entrench their relative position in the society. The negative effects due to exclusion and lack of equal opportunity are made permanent thereby perpetuating even greater inequality. It can give rise to social unrest threatening the smooth functioning of political economy. Therefore, measuring inequality is of utmost importance in welfare Economics. Economists have used several indexes to study the connection between growth and inequality. Some common statistical measures include coefficient of variation, Lorenz curve, Gini coefficient and Theils index. Each measure has its own merits and shortcomings. Despite being very common indicators in economics, there is lack of adequate resources illustrating their application using general programming languages such as R and Python. This piece is a modest attempt to fill this gap by discussing their implementation in Python alongside a practical use case.
2. Data Collection and Processing
We will work with per capita GDP (constant 2017 PPP$) from World Development Indicator database of World Bank. This data and a host of other indicators are available easily through their World Bank’s Python API – WBGAPI. For more on this, I encourage you to checkout the documentation available on PyPI (https://pypi.org/project/wbgapi/).
The processed dataframe looks as shown below:

3. Calculation of Statistical Indexes
3.1. Coefficient of Variation
The coefficient of variation (CV) is a statistical measure of the relative dispersion of data points in a data series around the mean. In plain English, the coefficient of variation is simply the ratio between the standard deviation and the mean.

The Scipy
library provides a class for easy implementation of this measure.
from scipy import stats
cv = pcgdp_wide.apply(stats.variation, axis = 0)
3.2. Lorenz Curve
Lorenz Curve was introduced by Max Lorenz in 1905. We can construct the Lorenz curve for any given year by computing the proportion of countries that are below each observation when sorted in the order of their per capita GDP. For that, we generate a cumulative series.
- First, we sort countries by their per capita GDP in ascending order.
- Second, we compute the percentage of countries that are below each observation.
- Third, we find out the income of each country as percentage of total income all countries put together.
- Fourth, we construct the cumulative sum of income percentages.
- Fifth, we can plot the Lorenz curve by plotting the cumulative percentage of countries and cumulative sum of income percentages obtained in steps 2 and 4.
In the code block below, we will generate the Lorenz curve for all the years beginning 2000 to 2020 and combine them in a gif file using screentogif
utility.

3.3. Gini Coefficient
The Gini coefficient was developed by the Italian statistician Corrado Gini in 1912 as an aggregate measure of income Inequality. It is represented by the area between the Lorenz curve and line of perfect equality (45-degree line). Although frequently used , Gini is criticised for its large scale aggregate nature, rendering it weak in estimating inequality at various levels of income distribution.

Mathematically, Gini is defined as the mean of absolute differences between all pairs of individuals for any measure e.g. income. The minimum value is 0 when all measurements are equal and the theoretical maximum is 1 for an infinitely large set of observations where all measurements except one has a value of 0, which is the ultimate inequality.

As much frightening this formula is, its Python implementation is equally a cakewalk. We can use either pysal
or quantecon
libraries to calculate the Gini coefficient.
# Method 1: Using pysal library
from pysal.explore import inequality
gini = pcgdp_wide.apply(inequality.gini.Gini, axis = 0)
# Method 2: Using quantecon library
import quantecon as qe
gini = pcgdp_wide.apply(lambda x: qe.gini_coefficient(x.values), axis = 0)
3.4. Theils Index
Theils index was proposed by a Dutch econometrician Henri Theil from Erasmus University Amsterdam. In plain English, it is the average of logarithms of income shares weighted by income shares.

Conceptually, you may find that this metric resembles a term called entropy in high school Physics/Chemistry. That’s correct. It is nothing but the entropy of income distribution measuring how evenly-distributed incomes are across the population. The Theils index is available in PySAL’s inequality module.
# Using pysal
from pysal.explore import inequality
theil = pcgdp_wide.apply(lambda x: inequality.theil.Theil(x).T, axis=0)
4. Results
We put together all the three inequality indexes together in a dataframe to observe their variation over the 20 year timespan using the code snippet below.
df = pd.DataFrame({'year':range(2000, 2020,1),
'CV': cv,
'gini': gini,
'theil': theil})
df.set_index('year').plot();
plt.xticks(range(2000, 2020,2))
plt.show()

We notice that all indexes broadly capture the same trend with decreasing inequality in per capita incomes of countries. This indicates that the gap between rich and poor countries is coming down as far as per capita GDP is concerned. As an exercise, try exploring the inequality trends by dividing the countries into four groups – high income, upper middle income, lower middle income and low income. You can use the World Bank’s country classification as given on this page. In conclusion, we discussed some commonly used statistical indexes for measuring inequality viz. CV, Lorenz curve, Gini and Theils; and their implementation using Python. There are few libraries which make it easy to work out the values of these indices by invoking just a one-line function.
Before we wind up,
I invite you to join me in this exciting data science odyssey. Follow my medium page to explore more exciting content about data science.
Disclaimer: Views are personal.
References
- https://python.quantecon.org/wealth_dynamics.html
- https://towardsdatascience.com/measuring-statistical-dispersion-with-the-gini-coefficient-22e4e63463af
- https://geographicdata.science/book/notebooks/09_spatial_inequality.html
- https://www.statsdirect.com/help/nonparametric_methods/gini_coefficient.htm
- https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data
- https://pypi.org/project/wbgapi/
- https://voxeu.org/content/why-inequality-matters