Data Analysis and Visualization with Jupyter Notebook

Visualize data in Jupyter Notebook with JS libraries

Veronika Rovnik
Towards Data Science

--

Rich interactive computing experience is what I love most about Jupyter Notebook. Besides, it’s a perfect web-based environment for performing exploratory analysis.

In this tutorial, I’d like to show how to empower the exploratory phase of your project with two interactive data visualization tools that are available as JavaScript libraries. The guide will take minimum time and steps to complete.

I believe data reporting should be comprehensible and take less time than the analysis itself. That’s why, without any extensions added, we will establish a working environment for reporting within Jupyter Notebook in the minimum time.

Hopefully, the resulting notebook template will help everyone who works with data to answer important domain-specific questions and present data analysis insights in a comprehensible form.

Let’s start!

Data

As a data sample, we’ll use the World Happiness Report from Kaggle to explore tendencies in data, both spatial and temporal.

Tools

  • JupyterLab — our environment. You can use simply Jupyter Notebook as well
  • WebDataRocks Pivot Table

This pivot table will handle all the data-related calculations: aggregating, filtering, and sorting data. Its main feature is interactivity and ease of use. Plus, it will act as an engine for our dashboard by processing data and passing it to the charts in the summarized form.

  • Google Charts

Since our brain reacts faster to charts and diagrams, we can combine the tabular data representation with interactive charts. Google Charts is a nice option for this purpose. This web service provides all the basic charts that are customizable to our needs.

Key ideas

Let’s break down the tutorial into key steps towards building our data reporting solution:

  • Importing required libraries
  • Data preparation
  • Embedding the pivot table and charts in the notebook
  • Sending the data to the table
  • Configuring a report
  • Sending the data from the table to the charts
  • Saving the report
  • Saving the notebook and sharing it

Step 1: Import Python libraries

Let’s figure out what functionality each library stands for:

1. IPython.display — an API for display tools in IPython

2. json — a module for serializing and de-serializing Python objects.

3. pandas — a primary library for data manipulation and analysis

Step 2: Get your data

In the first place, this step depends on how you store and access your data.

I used Google Sheets to unite data records from several CSV files that correspond to individual years. Using the approach described in this tutorial, I imported the data from Google Sheets to the pandas dataframe.

Alternatively, you can simply import the data from the file system using the read_csv() method. To combine information for several years, you can use the “concat” operation.

Let’s check how our data looks like:

Now that the data is loaded to the dataframe using the approach that fits you best, convert it to JSON with the “records” orientation specified. The resulting list-like structure is the data format that the pivot table understands:

Step 3: Create a pivot table and feed it with the data

Next, define a pivot table object with the report configuration specified:

Here is what we’ve configured so far:

  • A data slice — a subset of fields we want to present on the grid.
  • A data source and its type. Using the json.loads() method, we deserialized a string containing a JSON document to a Python object.
  • Optional view-related settings: conditional formatting, number formatting.

Next, let’s convert the pivot table presented by a Python dictionary to a JSON formatted string:

Step 4: Render the pivot table and charts in HTML

And here the most important step comes.

Let’s define a function that will render the pivot table and charts in the notebook cell.

Here, we’ve specified the HTML layout that contains the scripts of Google Charts and WebDataRocks, CSS styles, and containers where the pivot table and chart instances will be rendered. The entire layout is enclosed in triple quotes.

Additionally, we pass an argument that stands for executing custom JavaScript code. In this piece of code, we implement the logic of how and when the chart should be drawn. This should happen only when the pivot table is fully rendered, loaded with data, and localization files. We can track this moment with the “reportcomplete” event.

Here’s the code that does all the heavy-lifting of getting data from the table, creating a chart, and passing the summarized data to it:

Finally, let’s call the rendering function and see the resulting dashboard:

Results

As you see, the pivot table is filled with data and the fields are displayed and formatted according to the report rules we set.

Notice that the dashboard itself is interactive: you can play around with different elements of the reporting tool, slice and dice the data records on the grid, filter, and sort them. Moreover, the chart reacts to every change in the report.

What’s next?

Now you have a reporting tool that allows you to look at data from multiple angles. Take some time to explore the data: change tracked metrics, filter the data. Try adding other charting visualizations.

You can experiment with other data sets and seek insights. I hope you’ll enjoy the process as well as the results and boost your notebook productivity with this approach. After you’re done with data analysis, save the notebook and share it with friends or colleagues. Tell the story about your data. They are gonna love it!

References

--

--

Passionate about mathematics, machine learning, and technologies. Studying approaches in the field of data analysis and visualization. Open for new ideas :)