Bar Chart Race?

Zoe Zhu
Towards Data Science
4 min readOct 30, 2019

--

My curiosity starts with this type of “Bar Chart Race” that is so popular on reddit/datasibeautiful, youtube/wawamustats.

There is an even easier way that you can input your parameters and make your own bar chart race. https://app.flourish.studio/@flourish/bar-chart-race. It looks like this:

https://youtu.be/dwCXwaBghpc

There are so many models and methods that data scientists love to apply, but when it comes to explaining to people with various levels of technical skills. Graphs grasp the attention and make a better narrative.

To practice and show an example, I get the dataset from FiveThirtyEight github repo: https://github.com/fivethirtyeight/data/ or if you are a R user, there is a direct package that you can install and have fun with. My practice dataset is a dataset on: https://github.com/fivethirtyeight/data/tree/master/endorsements-june-30. The dataset is about the endorsement data before the primary election. It included the data indicating the endorsement points on the candidates before June 30 from 2000- 2012.

If you are interested in how I get to graph a racing bar graph, you can read my notes on this tutorial on how to start with a simple matplotlib animation using the matplotlib.animation module.

Let’s step through this and see what’s going on. After importing the required pieces of modules. An easy way to get a FiveThirtyEight style is to simply use style.use(). Matplotlib also has an animation module that can do most of the work for you.

  • Step 1: Import libraries

Let use try to graph the first bar chart that will show up at the end of the animation, which is the last frame the audience should see.

  • Step 2: Layout basic bar chart

Step 3: Add colors and group data

I format the color using the python color constants module:

  • Step 4: Put them into a function

Step 5: Using animation.FucAnimation Function to finalize the picture

You can also exported as gif or mov file. Simply change the speed parameter fps(frequency per second) and dpi to adjust the resolution, in the .save function.

I can totally read the chart with hindsight bias, campaign funding money doesn’t predict the performance of the candidate. But other than that, a cool animated bar graph doesn’t tell you that much going forward. After all, if we want to read a truly scientific and analytical piece, making some visualization is far not enough. I researched on how Nate Silver and his team’s approach. Here are some key takeaways.

Ask the right question first

The type of bar chart race is a typical representation in the case when there is a wide range of x-axis values and categorical variables on the y-axis. It is fascinating to see the drastic change over the years or who is “taking the lead” in certain industries. However, for a specific dataset like this, there is not enough yearly data plotting a bar chart race. After this attempt, I will never start to plot for the sake of just plotting aesthetically or for its popularity.

From an interview with FiveThirtyEight’s editors and commentators, in their major “projects,” the team presents a question — “Who will win the presidency in 2016?” , “How popular/unpopular is Donald Trump?”— then determines which types of a dataset to include and constructs a model accordingly. The method is loosely based on the baseball-derived tactic of “sabermetrics,” where a statistician gathers both a broad and deep range of possibly relevant quantitative information and uses it to model potential future outcomes. The scientist word is a reminder for all of us that an engineering mindset and scientific mindset are both quintessential along the way.

Accuracy in graphs

Rather than foregrounding a percentage, the primary visual indicator of the above bar graph is a frequency counting. This visualization underscores the scientific nature of data analysis rather than offering the desirable, but more subject-to-interpretation glimpse into the future of the Now-cast’s percentage. This approach also contributes to ongoing questions in data visualization, where the aesthetics of “beautiful” data intersect questions of how to accurately convey information — a challenge of form familiar to artistic debates in general.(1)

Other than creating accurate graphs, there is not always a consistent interpretation of numbers. Nate Silver mentioned in an interview that 73% of the readers misread the graphs and results they posted, saying that some of them don’t interpret odds and percentages in the right way. Data journalism seems to have so much power in us that could potentially change our perception and behavior.

References:

  1. Black, T. (2018). The Numbers Don’t Lie: Performing Facts and Futures in FiveThirtyEight’s Probabilistic Forecasting. Theatre Journal 70(4), 519–538
  2. Stigler(2016). The Seven Pillars of Statistical Wisdom

--

--

Passionate about #psychometrics#, #datascience#, #socialinjustice, #languages#, #statistics#. @Flatiron, @Northwell Health