
If you are using Python and want to do Data Analysis, you will probably use the Pandas library. And for good reason, since Pandas is a fast and flexible tool for data manipulation and analysis. However, as for someone who is new to Python, I find that it can be useful to go back to the basic building blocks of python for data analysis, to help me better learn Python for basic data wrangling. So as a Python exercise, I will do data analysis in Python without using the Pandas library. We will analyze future population growth on data produced by the United Nations.
We will analyze tabular data which means we will work on data stored in two-dimensional lists. To manipulate 2D lists we will make heavy use of simple and nested for-loops, indexing, and built-in python functions such as min(), max(), sort(), and append(). Tools that you will use within the workflow of using Pandas or other libraries.
The data* was downloaded from Gapminder. The full dataset contained yearly projections for 197 countries from 1800 to 2100. I used a subset with data for all 197 countries from 2020 to 2100 with a 5-year interval.
The dataset and code can be found in this Github repo.
Import the data
As you can see under the result, the data is stored in a two-dimensional list, where each row is an element in the list. The table is in a wide format where each ‘row’ is a country and each year is a ‘column’. That means each list within the list contains all Population information for each country.
Create a summarized table
We will start analyzing the population dataset by creating a summary table containing information about the highest and lowest expected population for each country, as well as the relative change in population from today to the year 2100. We will create a table called pop_exp_dev
which will contain the following columns:
- Country
- Lowest projected population
- Year of lowest projected population
- Highest projected population
- Year of highest projected population
- Relative change of population from 2020 to 2100
We create this table in a few steps.
- First, we need to convert the population data to integer values. We do this by using a nested for-loop. The outer loop iterates over each row, and the inner loop iterates over each item in the row and converts each item from string to integer. We start the outer loop at index 1 because we don’t need to convert the first row containing column names. We also start the inner loop at index 1 because the first value of each row contains the country name.
- We create an empty list
pop_exp_dev
in which we will store the new values. - To find the new values, we use for-loops to iterate over the rows to find the highest and lowest values and append them to our new table. We also find their index values so that we can identify and append the year of those values.
-
Finally, we want to find the expected change in population expressed in percent for the period 2020–2100. We calculate this with formula _pop_2100 – pop_2020 / pop2020 *100
We now have a table with some summaries of the population data. Viewing the data in a 2D list is not very pleasing to the eye, so for illustration, I show what the table we created looks like if you view it in tabular form.

Subset list for visualization
From this table, we can make a couple of plots to visualize the expected population. Plotting all 197 countries in one plot quickly makes the plot too long and difficult to read. Instead, we can subset parts of the table and make smaller plots. Here, we will subset the countries with the largest population growth and largest population decline for one plot, and we will subset all European countries for another plot.
Subset largest population growth and largest population decline
For the first plot, we simply sort the 2D list by the value of the relative change, stored in column 6 (index 5). We use sorted() to do this and pass a lambda function to the key argument.
The plot shows that we can expect to see the largest population increase in African countries and the largest population decline in European countries, except for Jamaica which will have the largest decline of all countries.

Subset countries in Europe
To create a subset of European countries, we manually create a list with all countries in Europe and loop over the pop_exp_dev
list to find any element matching the countries in the Europ list.
Creating a horizontal barplot we can see that a majority of the countries in Europe expect a decline in population by the year 2100.

Normalize the table to compare population development between countries
As some countries have large populations and some countries have small populations it is difficult to compare their population development. We can solve this by normalizing the population values. We will do this by setting the year 2020 as an index year, and all other years will be presented in relation to this index year. The population of each year is divided by the population of the index year and multiplied by 100.
As you can see under the result, the year 2020 is set as the index year and has value 100. The population of other years is presented as a percentual change in relation to the population in 2020.
Once the population data is normalized we can select which countries to plot by subsetting the list as shown before. Here, we plot the population development for Australia, Japan, Moldova, Sweden, the United Kingdom, and the United States. Of these countries, Australia is expected to see the largest population growth followed by the United States, Sweden, and the United Kingdom. Both Japan and Moldova are expected to see a decline in their populations.

Conclusion
For working with messy data or doing more advanced analyses, I would undoubtedly use Pandas. In fact, I would not recommend anyone to incorporate this workaround in their actual workflow since it is far from an optimal way of analysing data. For example, to write efficient code, it is preferrable to avoid for-loops and vectorize your operations. Still, I found that doing this light analysis on the population data helped me better grasp the basic functions of Python, like how to access certain items and elements in two-dimensional lists by using simple or nested for-loops. And how to create and append items to new two-dimensional lists using for-loops. I hope you found these examples useful!
*The data used in this article is based on free material from United Nations via GAPMINDER.ORG, CC-BY LICENSE
Why You Should Vectorize Your Code in R
How to Easily Access and Download Public Data from National Statistical Institutions
If you enjoy reading stories like these and want to support me as a writer, consider signing up to become a Medium member. It’s $5 a month, giving you unlimited access to stories on Medium. If you sign up using my link, I’ll earn a small commission.