The world’s leading publication for data science, AI, and ML professionals.

A Data Scientist’s Approach to Understanding the Global Economy

Sometimes I think that if I keep up with the news I develop a qualified opinion about most matters. But the news report, well, they report…

What does the global economy actually look like? I used tools from data science to understand and visualize it.

Sometimes I think that if I keep up with the news I develop a qualified opinion about most matters. But the news report, well, they report the news – changes to a not too distant past. It’s like looking at the world through the windscreen of a speeding car – it’s not about the big picture. Moreover, I often have the impression that it can sometimes be difficult to separate opinion from fact in the current news landscape. So I thought, I’d dive into the global Economy and see if I can deduce a basic understanding from the data without being an expert.

To aid me in my quest, I used the python package Pymrio which can be used to wrangle input-output tables of the global economy and also calculate footprints. Input-output tables contain information about how money flows in the economy. Final demand in the economy drives businesses to produce goods and services. The businesses rely on other businesses’ goods and services to produce their own goods and services. This is is called intermediate demand as opposed to final demand. The amount of money in the world is not constant, so the tables also represent the value added to the economy. The flow of capital between sectors, consumers and the value added is captured by input-output tables.

Most countries in the world produce input-output tables. But they use different formats, and some don’t produce them every year. Luckily, some researchers compile them into data sets to allow for research into the global economy and to calculate various types of footprints. Pymrio is a package that allows the user to interact with these input-output tables or create your own. I used the EORA26[1,2] data set which contains input-output tables for 189 regions from the years 1990 to 2015 with a common set of 26 sectors. It is a condensed version of the full EORA data base where the sectors have not been aggregated into common ones.

Pymrio allows the input-output tables to be aggregated in different ways. For example, let’s say you want to consider the European Union as a single region. Pymrio can then combine the data of the relevant region into a single region with the common sectors without you having to worry about the indexing. The same can be done with the sectors. Imagine that you want to compare countries but don’t care about the details of the sectors. Pymrio can then help you aggregate the sectors.

I wanted to gain an understanding about how money flowed across borders – which countries were well connected and which were not? I asked myself: Could the global economy be understood in terms of a few clusters of economies that trade with each other? And how are the sectors connected on a global scale?

I aggregated all the sectors to a single sector and compared the flow of money in the intermediate demand. I introduced a distance metric 1/_Tij, where _Tij is the average of the input of money from region i to region j and vice versa in the intermediate demand. With this distance metric, we can represent each region as a dot on map and place the dots such that they, to a reasonable extent, are placed at a distance that corresponds to the extent to which the two regions trade. I used multi-dimensional scaling for this. The map can be seen in Figure 1, below.

Figure 1. Position of dots calculated with MDS directly from distance matrix. Clustering performed with hierarchical clustering from the distance matrix with complete linkage. Image by author.
Figure 1. Position of dots calculated with MDS directly from distance matrix. Clustering performed with hierarchical clustering from the distance matrix with complete linkage. Image by author.

Since the map is constructed only from the distances, the concept of north, south, east and west do not make sense. The map can therefore be rotated any way you like. The map looks like a splatter of ink. The density of regions is high in the middle and there are a handful of outliers. So as a whole, the global economy appears to be very well connected with a few regions being left out.

But the image may be deceiving us. So I tried a few different clustering methods and found that hierarchical clustering with average and complete linkage seems to get consistent and reasonable clusters, although I couldn’t come up with a good quality metric (let me know if you have a good idea).

I found that overall there are a few outliers in the global economy which are very isolated. These are represented by the orange and red dots in Figure 1. Then there is an outer region of dots which are not well connected to each other nor the center of the map. These loosely connected dots are shown as the dark blue dots in the map. The whole economy revolves around a tightly connected core of countries, shown in cyan on the map. It is difficult to comprehend all 189 dots in the figure. So I also put the clusters on a world map as shown Figure 2.

Figure 2. Clustering results represented on a world map (made with Plotly). Image by author.
Figure 2. Clustering results represented on a world map (made with Plotly). Image by author.

We can see that the outliers are Somalia, Moldavia, Myanmar, Sudan, and South Sudan (in orange). The rest of the world is more or less split in cyan and blue. The loosely connected countries can be found in Africa, the Middle East, Central America and a few places in the Middle East. The rest of the world belong to the well-connected core.

Ok, so we have learned, that the world is more or less split in two. Those that trade a lot with each other and those that don’t. But of those that trade, who trade? To find out, I converted the intermediate demand to the form of a graph using NetworkX. A graph consists of nodes which are connected by edges. I designed the graph such that each node represents a region and is connected to each other node by both an in-going and out-going edge. The direction and weight of the edge corresponds to the direction and magnitude of the money flowing between the regions in the intermediate demand. My graph is dynamic, in the sense that the edges change as function of time (year). This is not supported by NetworkX so I had to write some code to create a dynamic graph which could be visualized with Gephi.

Figure 3, shows a screenshot of part of the graph corresponding to 2014. The position of the nodes is calculated using a spring model (ForceAtlas2). Each edge is modeled as a spring, the stiffness of which is proportional to the edge weight. Furthermore, the nodes are modeled as electrical charges of the same sign such that nodes that are close together, repel each other more than nodes that are further apart. The system is initialized with random positions and allowed to relax until the nodes do not move. The relative distance between the nodes therefore indicate how strongly they are connected but again, the system is rotational invariant. The size and color of the nodes correspond to the total in- and out-going flow of money.

Figure 3. Visualization of graph of intermediate demand in 2014 visualized with Gephi using ForceAtlas 2. Image by author.
Figure 3. Visualization of graph of intermediate demand in 2014 visualized with Gephi using ForceAtlas 2. Image by author.

We can see that there appears to be three major hubs of trade in the global economy: USA, Germany (DEU) and China. Germany is apparently a trade hub for the rest of EU. My native country Denmark is located close to Norway, Sweden and Finland in the model as they would be on the map, but Sweden appears to be closer connected to Germany than Denmark, even though it does not share a border with it.

A lot of trade appears to be happening between Canada and the US and the US appears to be sending more money to Canada than the opposite direction (there is trade deficit). The US also appears to have a trade deficit with China. It would be interesting to see how the trade war under the previous US administration has changed this pattern, but unfortunately the data is not up to date. The US and China appears to be more connected to countries in Asia than the EU. In general, trade with China appears to be the single largest qualitative change in the global economy over the time period. Overall, I have learned from the graph, that while trade is global, the is a local aspect to it as well. A few big economies dominate the global trade and their closest trade partners are either other large economies or their immediate neighbors.

This visualization takes a regional perspective. We can also change the perspective to that of the sectors. This might allows us to identify supply chains and in general get an idea how money flows between the sectors on a global scale. The corresponding graph for the sectors aggregated across all countries are shown in Figure 4, below.

Figure 4. Visualization of graph of intermediate demand for sectors in all regions in 2014. Image by author.
Figure 4. Visualization of graph of intermediate demand for sectors in all regions in 2014. Image by author.

Again, the size and color of the nodes indicates how closely the sectors are connected by trade – not the output of the sector. Looking at the figure, we can easily identify supply chains. Take for instance, the chain "Hotels and Restaurants"->"Food & Beverages" -> "Agriculture". It is clear which way the money flows, and obvious which way the products flow. Another similar chain is "Construction"->"Petroleum, Chemical and Non-Metallic Mineral Products"->"Mining and Quarrying". Interestingly, all sectors are reasonably closely connected to "Financial Intermediation and Business Activities". This indicates that all sectors need access to capital. As a consequence, this sector is the largest distributor of money of all the sectors.

I find it interesting how much all the sectors rely on each other. If you look closely, many supply chains are hierarchical but most are branched and a few appear circular. As a consumer, if you buy products, it is almost impossible to anticipate how your spending habits perturb this system, but in reality it is total final demand that has shaped and continues to shape the industry.

This post is concerned with the global economy from the perspective of a data. But I am actually more pre-occupied with understanding the climate crisis. Some of the solutions to the climate crisis may actually come from understanding how final demand shapes the global trade.This is because the response of the system to a small change in final demand can be assessed by extending the tables with information about emission from the different sectors and a tiny bit of linear algebra. More on that in my next post 🙂

If you are interested, you can find the code here. This is my first post. I hope you enjoyed it. Feel free to leave comments and get in touch. Please note that I am not an economist so I don’t know how my naïve analysis fits into the experts’ opinion.

[1] Lenzen M, Kanemoto K; Moran D, and Geschke A (2012) Mapping the structure of the world economy. Environmental Science & Technology 46(15) pp 8374–8381. DOI: 10.1021/es300171x [2] Lenzen, M., Moran, D., Kanemoto, K., Geschke, A. (2013) Building Eora: A Global Multi-regional Input-Output Database at High Country and Sector Resolution. Economic Systems Research, 25:1, 20–49, DOI:10.1080/09535314.2013.769938


Related Articles