The world’s leading publication for data science, AI, and ML professionals.

Visualizing Fortune 500 Companies in a Bar Chart Race

Using Python and Flourish to visualize rank and revenue trends of the world's largest companies

Designed by Vectorarte / Freepik
Designed by Vectorarte / Freepik

Companies rise and fall amidst the intense and ruthless global competition, thus it would be fascinating to visualize the progress of the top global firms over the past few decades.

The Fortune Global 500 is an annual ranking of the top 500 corporations worldwide as measured by revenue, and it serves as a good source of data for running visual analysis. I figured it would also be an enriching experience to generate bar chart race animations using code (Python) and no-code (Flourish) solutions. Let’s get started!

Table of Contents

(1) Data Acquisition (2) Data Preparation (3) Bar Chart Race with Python bar-chart-race package (4) Bar Chart Race with Flourish(5) Additional Insights


Data Acquisition

Publicly available data for revenue and rank of Fortune Global 500 companies was retrieved from the Fortune 500 Search page. The data was extracted without any form of web scraping since the size of the dataset is small, and the simple method of copy-pasting values into Excel sheets was far more efficient than writing the scraping code.

Although recent data included useful information such as profits, assets, and number of employees, these features were omitted in this project as they were available only from 2015. Hence, this analysis focused on revenue ($USD), which was available all the way from 1995.

With that, I collated a dataset of ~13,000 observations of the annual revenue and rankings of the Global 500 companies over 26 years (1995–2020).

Note: There is an error in the original Fortune 500 data source, where datasets for 2007 and 2008 are duplicates of each other.


Data Preparation

1) Data Pre-Processing

We first import the Excel sheets into a Python Jupyter notebook (using the openpyxl package) and concatenating them into a single Pandas DataFrame.

Random sample (10 rows) of the concatenated DataFrame
Random sample (10 rows) of the concatenated DataFrame

The pre-processing step is vital primarily because we need a consistent identifier (i.e. standardized name) for each unique company in order to trend their progress. This is complicated by several issues over the years:

  • Mergers and acquisitions led to different company names (e.g. ‘Exxon’ + ‘Mobil’ = ‘ExxonMobil’)

  • Names of corporations change owing to rebranding efforts (e.g. ‘Amazon.com’ -> ‘Amazon)

  • Format of company names not kept consistent (e.g. ‘Wal-Mart Stores’ vs ‘Walmart’)

  • Different versions stemming from abbreviated and unabbreviated names (e.g. ‘British Petroleum’ vs ‘BP’)

Additional efforts (e.g. scouring Fortune 500 website) were taken to understand the companies’ history so as to accurately capture the names representing them.

With the end goal of visualizing only the top 10 companies in a bar chart race, I simplified the data cleaning process by focusing on companies that have ever made it into the top 10 list. Two rounds of pre-processing were conducted:

Round 1: Remove filler words in company names e.g. ‘ Co., Ltd.’, ‘Company’, ‘P.L.C.’, ‘Corp.’ , ‘A.G.’ etc. Round 2: Identify different variations of company names, and replace them with standardized ones. This allows for accurate data retrieval of these companies over the years. Here is a snippet of the pre-processing code:

master_df['Name'].replace('Apple Computer','Apple',inplace = True)
master_df['Name'].replace('Amazon.com','Amazon',inplace = True)
master_df['Name'].replace('American International Group','AIG',inplace = True)
master_df['Name'].replace('British Petroleum','BP',inplace = True)

2) DataFrame Formatting

After verifying that pre-processing was done successfully (by manually checking the output), we need to convert the DataFrame into a wide format for bar chart race visualization. The formatting is as such:

  • Time component (i.e. year) serves as index
  • Every row contains values for that particular year (in index)
  • Every column contains values for each of the respective companies
  • Values represent company’s annual revenue (in USD$ million)

The formatted DataFrame should look like this:

Snapshot of the formatted DataFrame for bar chart race visualization
Snapshot of the formatted DataFrame for bar chart race visualization

Note: ‘0’ value means that the company was not in the Global 500 list that year

Now that the dataset is properly formatted, let’s explore two ways to visualize the bar chart race, namely with Python bar-chart-race and with Flourish.


Bar Chart Race with Python bar-chart-race package

The Python matplotlib library already lets us build bar chart races relatively easily, as demonstrated in this article. In order to do it even more efficiently with fewer lines of code, I leveraged the bar-chart-race package, which was built on top of matplotlib. Its official documentation can be found here.

This is the Python code used to generate the bar chart race (where the output is a .mp4 file):

Here is the bar chart race output (displayed as a low-resolution gif):

Bar Chart Race Visualization in Python (Values represent annual revenue in USD$)
Bar Chart Race Visualization in Python (Values represent annual revenue in USD$)

Note: To save animation to .mp4 (which can later be converted into a gif), we need to install FFmpeg and add FFmpeg bin folder into the path.

While the bar-chart-race package lets us create basic functional bar chart race animations, it still leaves a lot to be desired. Some drawbacks include:

  • Multiple code changes needed in order to make granular aesthetic changes (e.g. color, positioning etc.)
  • Limited options available for the animation customization
  • Inability to visualize charts right after changes are made

With that, let’s check out how Flourish can overcome these issues by allowing us to easily generate beautiful visualizations without any code.


Bar Chart Race with Flourish

Flourish is a powerful and flexible platform for data visualization and storytelling, making it easy for users to turn spreadsheets into interactive visualizations. It comes with a host of features and templates (including bar chart race), allowing for easy edits using simple point-and-click options.

After exporting the formatted dataset as an Excel document, import it into a new visualization on the Flourish platform. The following bar chart race can then generated with just a few steps:

While I derive joy from writing functional code, I very much prefer tools that can help me complete tasks in a simpler and more pragmatic fashion. Flourish certainly fits the bill here.

Flourish offers an extensive range of customizations, including a Playback button for users to pause or rewind the bar chart race (try clicking the interactive graph above!).

This beautiful visualization was done with the Free plan, where the only downsides are that the data and project must be public, and HTML versions of the charts cannot be downloaded locally.

Because this visualization is public, feel free to duplicate my bar chart race copy and experiment with it on Flourish, by clicking the ‘Duplicate and edit‘ button at the top right corner:

Click on the top right button to duplicate Flourish visualization | Image by author
Click on the top right button to duplicate Flourish visualization | Image by author

Additional Insights

Top Ranked Company

General Motors led the charts in the 1990s, but since 2002, Walmart has been the dominant force in the rankings. In fact, Walmart has been ranked number one an incredible 15 times in the past 26 years (with no signs of letting up).

Walmart has frequently been the world's largest company by revenue
Walmart has frequently been the world’s largest company by revenue

Comparing Top and Bottom ranked companies

Over the years, the difference in revenue between the top-ranked and bottom-ranked Global 500 companies has grown increasingly larger. This is mainly driven by significant revenue increases of the top-ranked company over time.

Rank and revenue trends of top and bottom ranked Global 500 companies (Visualization done with Plotly)
Rank and revenue trends of top and bottom ranked Global 500 companies (Visualization done with Plotly)

The revenue difference back in 1995 was $167,992M, and that has since tripled to $498,578M in 2020. Interestingly, the ‘cut-off’ revenue for a company to gain entry into the Global 500 list has remained relatively stable at around $17,000M – $22,000M.


Conclusion

In this article, I shared about how to acquire and pre-process data (in Python) for the visualization of rank and revenue trends of Fortune Global 500 companies. I also discussed how to create bar chart race animations using code (Python bar-chart-race) and no-code (Flourish) solutions. The Python code for this project can be found on this GitHub page.

Before you go

I welcome you to join me on a Data Science learning journey! Follow this Medium page and check out my GitHub to stay in the loop of more exciting data science content. Meanwhile, have fun using Python with Flourish!

Enhance your Python code’s readability with pycodestyle


Related Articles