Tell A Story with Data
The amount of data we need to work with, analyze, and explore; and that amount is only going to grow larger and larger as the technology around us advances. Now, imagine having to stare at thousands of rows of data in a spreadsheet, trying to find hidden patterns and track down changes in the number that could be useful in your post-analysis interpretation.
Doesn’t sound fun, does it?
That is where data visualization comes in. Having a visual summary of information makes it easier to identify patterns and trends than looking through rows of a spreadsheet. Humans are visual creatures, we interact and respond better to visual stimulation, and visualizing data is one way to make it easier for us to understand our data better. Since the purpose of data analysis is to gain insights and find patterns, visualizing the data will make it much more valuable and easy to explore. Even if a data scientist can reach insights from data without visualization, it will be more challenging to communicate the meaning of it to others without visualization. The different types of charts and graphs make communicating data findings faster and more effecient.
The importance of visualizing data goes beyond easing up the interpretation of the data. Visualizing the data can have many benefits, such as:
- Showing the change in the data over time.
- Determining the frequency of relevant events.
- Pointing out the correlation between different events.
- Analyzing value and risk of different opportunities.
In this article, I will talk about a Python library that can help us create eye-catching, stunning, interactive visualizations. The library is Pygal.
Without further ado, let’s get into it…
Pygal
When it comes to visualizing data in Python, most data scientists go with the infamous Matplotlib, Seaborn, or Bokeh. However, one of the libraries that are often overlooked is Pygal. Pygal allows the user to create beautiful interactive plots that can be turned into SVGs with an optimal resolution for printing or being displayed on webpages using Flask or Django.
Getting familiar with Pygal
Pygal offers a wide variety of charts that we can use to visualize data, to be precise, there are 14 charts categories in Pygal, such as histogram, bar charts, pie charts, treemaps, gauge, and more.
To use Pygal’s magic, we need to install Pygal, first.
$ pip install pygal
Let’s plot our first chart. We will start with the simplest char, a bar chart. To plot a bar chart using Pygal, we need to create a chart object and then add some values to it.
bar_chart = pygal.Bar()
We Will plot the factorial for numbers from 0 to 5. Here I defined a simple function to calculate the factorial of a number and then used it to generate a list of factorials for numbers from 0 to 5.
def factorial(n):
if n == 1 or n == 0:
return 1
else:
return n * factorial(n-1)
fact_list = [factorial(i) for i in range(11)]
Now, we can use this to create our plot
bar_chart = pygal.Bar(height=400)
bar_chart.add('Factorial', fact_list)
display(HTML(base_html.format(rendered_chart=bar_chart.render(is_unicode=True))))
This will generate a beautiful, interactive plot

If we want to plot different kinds of charts, we will follow the same steps. As you might’ve noticed, the primary method used to link data to charts is the add
method.
Now, let’s start building something based on real-life data.
Application
For the rest of this article, I will be using this dataset of the COVID-19 cases in the US to explain different aspects of the Pygal library.
First, to make sure everything works smoothly, we need to ensure two things:
- That we have both Pandas and Pygal installed.
- In Jupyter Notebook, we need to enable IPython display and HTML options.
from IPython.display import display, HTML
base_html = """
<!DOCTYPE html>
<html>
<head>
<script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
<script type="text/javascript" src="https://kozea.github.io/pygal.js/2.0.x/pygal-tooltips.min.js""></script>
</head>
<body>
<figure>
{rendered_chart}
</figure>
</body>
</html>
"""
Now that we are all set up, we can start exploring our data with Pandas and then manipulate and prepare it for plotting using different kinds of charts.
import pygal
import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv")
This dataset contains information about the COVID-19 cases, deaths based on dates, counties, and states. We can see that using data.column
to get an idea of the shape of the data. Executing that command will return:
Index(['date', 'county', 'state', 'fips', 'cases', 'deaths'], dtype='object')
We can get a sample of 10 rows to see what our data frame looks like.
data.sample(10)

Bar Chart
Let’s start by plotting a bar chart that displays the mean of the number of cases per state. To do that, we need to execute the following steps:
- Group our data by the state and extract on the case number of each state and then computing the mean value for each state.
mean_per_state = data.groupby('state')['cases'].mean()
- Start building the data and adding it to the bar chart.
barChart = pygal.Bar(height=400)
[barChart.add(x[0], x[1]) for x in mean_per_state.items()]
display(HTML(base_html.format(rendered_chart=barChart.render(is_unicode=True))))
And, voila, we have a bar chart. We can remove data by unselecting it form the legend list, and we can re-add it by re-selecting it again.

The complete code for the bar chart
Treemap
Bar charts help show the overall data, but if we want to get more specific, we can choose a different type of char, namely, a treemap. Treemaps are useful for showing categories within the data. For example, in our dataset, we have the number of cases based on each county in every state. The bar chart was able to show us the mean of every state, but we couldn’t see the case distribution per county per state. One way we can approach that is by using treemaps.
Let’s assume we want to see the distribution of the detailed cases for the 10 states with the most significant number of cases. Then, we need to manipulate our data first before plotting it.
- We need to sort the data based on cases and then group them by states.
sort_by_cases = data.sort_values(by=['cases'],ascending=False).groupby(['state'])['cases'].apply(list)
- Use the sorted list to get the top 10 states with the most significant number of cases.
top_10_states = sort_by_cases[:10]
- Use this sublist to create our treemap.
treemap = pygal.Treemap(height=400)
[treemap.add(x[0], x[1][:10]) for x in top_10_states.items()]
display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True))))
This treemap, however, is not labeled, so we can’t see the county names when we hover over the blocks. We will see the name of the state on all the county blocks in this state. To avoid this and add the county names to our treemap, we need to label the data we’re feeding to the graph.

Before we do that, our data is updated daily. Hence, there will be several repetitions for each county. Since we care about the overall number of cases in each county, we need to clean up our data before adding it to the treemap.
#Get the cases by county for all states
cases_by_county = data.sort_values(by=['cases'],ascending=False).groupby(['state'], axis=0).apply(
lambda x : [{"value" : l, "label" : c } for l, c in zip(x['cases'], x['county'])])
cases_by_county= cases_by_county[:10]
#Create a new dictionary that contains the cleaned up version of the data
clean_dict = {}
start_dict= cases_by_county.to_dict()
for key in start_dict.keys():
values = []
labels = []
county = []
for item in start_dict[key]:
if item['label'] not in labels:
labels.append(item['label'])
values.append(item['value'])
else:
i = labels.index(item['label'])
values[i] += item['value']
for l,v in zip(labels, values):
county.append({'value':v, 'label':l})
clean_dict[key] = county
#Convert the data to Pandas series to add it to the treemap
new_series = pd.Series(clean_dict)
Then we can add the series to the treemap and plot a labeled version of it.
treemap = pygal.Treemap(height=200)
[treemap.add(x[0], x[1][:10]) for x in new_series.iteritems()]
display(HTML(base_html.format(rendered_chart=treemap.render(is_unicode=True))))
Awesome! Now our treemap is labeled. If we hover over the blocks now, we can see the name of the county, the state, and the number of cases in this county.

The complete code for the treemap
Pie Chart
Another form we can present this information is using a pie chart to show the 10 states with the most significant number of cases. Using a pie chart, we can see the percentage of the number of cases of one state relative to the others.
Since we already did all the data frame manipulation, we can use that to create the pie chart right away.
first10 = list(sort_by_cases.items())[:10]
[pi_chart.add(x[0], x[1]) for x in first10]
display(HTML(base_html.format(rendered_chart=pi_chart.render(is_unicode=True))))

The complete code for the pie chart
Gauge chart
The last type of chart we will talk about is the gauge chart. The gauge chart looks like donuts, and it is useful for comparing values between a small number of variables. So, we will go by comparing the top 5 states in the dataset.
The gauge chart has two shapes, the donut shape or in Pygal the SolidGauge
, and the form of the needles, or the Gauge
.
The Donut Shape
gauge = pygal.SolidGauge(inner_radius=0.70)
[gauge.add(x[0], [{"value" : x[1] * 100}] ) for x in mean_per_state.head().iteritems()]
display(HTML(base_html.format(rendered_chart=gauge.render(is_unicode=True))))

The Needle Shape
gauge = pygal.Gauge(human_readable=True)
[gauge.add(x[0], [{"value" : x[1] * 100}] ) for x in mean_per_state.head().iteritems()]
display(HTML(base_html.format(rendered_chart=gauge.render(is_unicode=True))))

The complete code for the gauge chart
Styling
Pygal also gives us the chance to play with the colors of the charts; the styles already defined in the library are:
- Default
- DarkStyle
- Neon
- Dark Solarized
- Light Solarized
- Light
- Clean
- Red Blue
- Dark Colorized
- Light Colorized
- Turquoise
- Light green
- Dark green
- Dark green blue
- Blue
To use the built-in styles, you will need to import the style you want, or you can import them all.
from pygal.style import *
Here are some examples of different built-in styles.




Aside from these styles, you can define a custom style by setting the parameters of a style object. Some of the properties that can be edited are color
which represents the series color, the background
, and foreground
which represents the color of the chart’s background and foreground, respectively. You can also edit the opacity
and the font
properties of the charts.
Here’s the style object to my custom styles 😄
from pygal.style import Style
custom_style = Style(
background='transparent',
plot_background='transparent',
font_family = 'googlefont:Bad Script',
colors=('#05668D', '#028090', '#00A896', '#02C39A', '#F0F3BD'))
Note: the font-family property won’t work if you include the SVG directly, you have to embed it because the google stylesheet is added in the XML processing instructions.

Phew…
That was a lot of charts and colors…
The Pygal library offers so many more options, more graph types, and more options to embed the result graph’s SVG on different websites. One of the reasons I like working with Pygal a lot is, it allows the user to unleash their creativity and create enchanting graphics that are interactive, clear, and colorful.
References
[1] Pygal documentation http://www.pygal.org/en/stable/index.html