Combining python and d3.js to create dynamic visualization applications

Kanishka Narayan
Towards Data Science
11 min readSep 26, 2019

--

The visual form is hypnotic and arresting, unlike any other medium. A painting or an image forces the eyes to see the full picture and presents a form that is free of the constraints of time. It is no wonder that visuals help in adopting a non-linear perspective while trying to understand and solve complex problems.

Problem solving through data analysis and programming, on the other hand, is still very much rooted in the linear perspective, since they involve a step by step breakdown of data to understand and solve a problem. However, data analysis when done correctly allows a user to account for minute details and idiosyncrasies that are usually missed when looking at the whole picture. Combing a data driven approach along with a more visual oriented approach provides a holistic approach to problem solving and analysis that combines linear and non-linear perspectives. In this article, I explain through a detailed, reproducible example, how a user can combine python (a powerful programming language for data processing) and d3.js (a powerful language for generating visuals) to create a visualization application that provides useful insights for problem solvers.

What I will demonstrate is how a user can effectively create a data processing back end in python while maintaining a visual front end in d3.js to create an effective application. We will also add some controllable features so that the front end and the back end can communicate with each other effectively on the basis of inputs from the final user. The python module we will use is ‘Flask’ which will act as the intermediary between the back end and the front end. The d3 visualization I have chosen is the collapsible bar chart example created by Mike Bostock. Creation of the visualization structure will involve some use of html, js and some jinja code.

The problem -

We will be using agricultural production data from the FAOSTAT database. The FAOSTAT database provides data for 213 regions for different years on several variables that is disaggregated by crop type, meat type and fish type. We will try to understand and explore the aggregations and disaggregations in the FAOSTAT data across countries across time through a dynamic visualization application. You can find the edited data sets used for this example here. We will use two datasets, one on production which is dis-aggregated by different types of crops, meat and fish and one on agricultural losses dis-aggregated by the same categories. Both data sets contain data for 213 regions from 2010 to 2013. We will create an application that helps a user compare the losses and production for any category or sub-category using the collapsible bar chart visualizations. The user should also be able to select any country and year to create visualizations for those countries. Basically, the end (edited) product will look like the below image and gif,

See the gif below to see how the app works
This gif demonstrates the usage of the application on a local machine

Part 1: Defining the structure of the application

What we will be doing, is create a front end on a html page which will host our visualization and d3.js scripts. We will send the data to this html page from python code contained in a file called ‘application.py’. The structure of the application on the computer will be as follows.

#Application file structureroot/    static/
This folder contains your supplementary JavaScript files
/templates
This folder contains the index.html file
application.py (The main python file and data are hosted in the root folder)

Below is a diagrammatic representation of the application

Source: Author’s conception

Part 1: Defining the front end (html, d3.js)

First, let’s design the front end which will be a basic html page (“index.html”) which will host our d3 visualization along with a form where a user can submit a country and year selection. The final html is hosted here. I won’t walk through some basic things like the css and formatting, etc. Those you can take directly from the html or customize as per your preferences. The steps to create the basic html page will be as follows,

1. Get all scripts required

2. Create a form where the user can change selections of the country and year.

3. Create “div” elements to host the visualizations

4. Insert d3 code to create graphs. We will also define links between the python back end and the d3 using jinja code.

You will need to have basic d3 version (d3.v3.min.js) which you can bring into the html using this command,

<script src=”https://d3js.org/d3.v3.min.js"></script>

Let’s first create the form where the user can submit country and year information. This can be accomplished through some html code that will generate a ‘form’ where a user can submit a request. Note that the names assigned below such as “Country_field” and “Year_field” are important since those will be referenced again in the back end in python.

<form method=”post” ><input name=”Country_field” placeholder=”Enter country name” > <input type=”number” name=”Year_field” placeholder=”Enter Year” > <input type=”submit” value=”Submit Country and Year” > </form>

Now, we will create two divisions, one to host the production data graph on the left and one to host the loss data on the right. The divisions should also display the country and the year selected. We will have to specify ids for the graphs and will have to write out some code in jinja to get the country name and the year. The jinja code basically uses curly brackets {{}} to access data from python. Assigning a class to the divisions helps in easy additions of formatting later. The code for the same is,

<div class=”left-div” id=”graphDiv” style=”border: thin solid black”>
<p>
<b>
<i>

Production data by category and sub-categories in 1000 tonnes for {{CountryName}}(Click on the bars to explore sub-divisions)</b>
<br> <br>
FAO defines production as “Unless otherwise indicated, production is reported at the farm level for crop and livestock products (i.e. in the case of crops, excluding harvesting losses) and in terms of live weight for fish items.”

</i>
</p>
</div>

The above code produces this,

Production division in html

We will have to repeat the same code to create another division for the loss data. After that, let’s define our d3 functions to create the plots. I have defined 2 functions, svg1 and svg2 for the left and the right plot respectively. I have used the code as is for the most part from Mike Bostock’s example here. We will only make a couple of changes. Firstly, where the svg object is defined in the code, we will have to reference our graph div ids so that the function will create the graph in the divisions we made above. So, in our example this becomes,

var svg1 = d3.select(“#graphDiv”)
.append(“svg”)
.attr(“width”, width + margin.left + margin.right)
.attr(“height”, height + margin.top + margin.bottom)
.append(“g”)
.attr(“transform”, “translate(“ + margin.left + “,” + margin.top + “)”);

Now, as mentioned above, the back end data processor will be constructed in python. Therefore, we will have to pass the data from python to the js script using the code below. The “/get-data” is a function that we will define in our python code later. “d3.json” will read in data in a json format.

d3.json(“/get-data”, function(error, root){
partition.nodes(root);
x.domain([0, root.value]).nice();
down(root, 0);});

Finally, we make a small tweak in the code for the color of the bars. We want green bars for the production graph and blue bars for the loss graph. We will change the color by changing the color variable in the code below,

var color = d3.scale.ordinal()
.range([“green”, “#ccc”]);

Part 3: Creating the back end in python (flask)

The steps in creating the python file are a bit more time consuming. The final application file is available here. We will need to perform the following steps,

1. Import neccessary packages, define the application in flask and create a datastore.

2. Create the code to generate data to send to the front end for the home page.

3. Convert data into json format for d3 and send the same to the front end

4. Similarly, define functions specifically for the production and loss graphs.

OK, let's get the easy stuff out of the way. Let’s get the packages, define the flask application and create a datastore function with 4 variables. The datastore variable will help later on to save data before passing the same to the front-end. We will create a ‘CountryName’ variable, a ‘Year’ variable, both of which the user will send to the application through the form. We will create a “Prod” variable which will store the production data, and a “Loss” variable which will store the loss data.

#1. Import packagesfrom flask import Flask, flash, redirect, render_template, request,   session, abort,send_from_directory,send_file,jsonify
import pandas as pd
import json


#2. Declare application
app= Flask(__name__)

#3. Create datastore variable
class DataStore():
CountryName=None
Year=None
Prod= None
Loss=None
data=DataStore()

Now, let's define the main page of the application. We will first have to define the route to the main page and a homepage function that will create the data for the homepage. What is basically happening is that when a user visits the main page, the homepage function will be called. We will also “get” the data from the front-end using a simple flask function called “request”. When requesting the data, note that we are using the ids defined in the html such as ‘Country_field’ and ‘Year_field’. We are also setting a default value of India for the country, and 2013 for the year. We will also pass this requested data to our datastore function variables ‘Year’ and ‘CountryName’ (The difference between datastore variables and other variables is explained below). Finally, we will read in the data for production and create variables for our analysis called CountryName and Year. Note that these are the actual variables that will be passed to the html and not internally stored in python. A good way to think of this is that the datastore is python’s internal memory which is being constantly updated with time. The static temporary variables are values created at a single point in time to be passed to the front end. As mentioned above, the visualization is created for 1 point in time and hence uses the temporary variables.

#We are defining a route along with the relevant methods for the #route, in this case they are get and post.

@app.route(“/”,methods=[“GET”,”POST”])
#We are defining a home page function below. We will get the #CountryName and the Year from the form we defined in the html def homepage():
data.CountryName = request.form.get(‘Country_field’,’India’)
data.Year = request.form.get(‘Year_field’, 2013)
data.CountryName=CountryName

df = pd.read_csv(‘CropsFull.csv’)
CountryName = data.CountryName
Year= data.Year

Now, we will filter the data (df) for the values we received from the form. This is a straightforward filter in python. We will also keep only relevant columns for further processing. I have converted the Year variable to an integer since sometimes the request will return a string and this may lead to python being unable to filter the data.

# Filter the data frame (df)
df = df[df.Country == CountryName]
df = df[df.Year == int(Year)]
#Keep only relevant columns
df = df[[“Category”, “Cat”, “value”]]

Now, we will need to convert this dataframe into a layered json. The json is layered in accordance with the aggregation categories in the data, and is therefore useful for the visualization. I have attached the code for the same below. But I have used Andrew Heekin’s code to create nested jsons for the same. The code can be found here. I will not go into the details of the code here.

df1 = df.groupby([‘Category’, ‘Cat’])[‘value’].sum()
df1 = df1.reset_index()
#Lets create a dict
d = {"name": "flare", "children": []}

for line in df1.values:
Category = line[0]
Cat = line[1]
value = line[2]

# make a list of keys
keys_list = []
for item in d['children']:
keys_list.append(item['name'])

# if 'the_parent' is NOT a key in the flare.json yet, append it
if not Category in keys_list:
d['children'].append({"name": Category, "children": [{"name": Cat, "size": value}]})

# if 'the_parent' IS a key in the flare.json, add a new child to it
else:
d['children'][keys_list.index(Category)] ['children'].append({"name": Cat, "size": value})

flare = d

Now, we will have to dump this data into a json format. and each time we assign it we can load it using the json load function. As mentioned above, let's save this data to both a temporary variable ‘Prod’ to pass to the front end and to a python memory variable called ‘data.Prod’ from our datastore function.

#Dump data to json
flare = json.dumps(flare)
#Save to datastore
data.Prod = json.loads(flare)
#Save to temporary variable
Prod=data.Prod

We will process data for losses using the above steps. I won’t repeat the entire code here. The last lines for the loss code will be,

#Dump data to json
flare = json.dumps(flare)
#Save to datastore
data.Loss = json.loads(flare)
#Save to temporary variable
Loss = data.Loss

Finally let’s wrap up our function with a return statement. We will use the flask ‘render_template’ function to send the data to our front end (the index.html’ file. We will also return all our temporary variables such as the CountryName, Year, the production and loss data

return render_template(“index.html”,CountryName=CountryName,Year=Year,Prod=Prod,Loss=Loss)

The above code send data to the main page. We will also have to write 2 other functions to send the production and loss data to our js functions. Given that we have a datastore that remembers our production and loss data, this should be fairly simple. Lets define a route called “/get-data” and send our production data to it. Note that the function returns a ‘jsonified’ version of the data.

@app.route(“/get-data”,methods=[“GET”,”POST”])

def returnProdData():
f=data.Prod

return jsonify(f)

We will create a similar function for the loss data at a route called ‘/get-loss-data’.

@app.route(“/get-loss-data”,methods=[“GET”,”POST”])

def returnLossData():
g=data.Loss

return jsonify(g)

Finally lets define the code to run the app,

if __name__ == "__main__":
app.run(debug=True)

There you have it. Your application is ready! Go ahead and run it! On running the code, you should get the following message with a link to the application on a local drive.

* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

This application is easily deployable on servers. I have deployed it on a free heroku server. The same can be accessed here. Note that I am using the free version of heroku, so the load time is a bit slow (You may have to referesh the application a couple of times). I have also added the requirements.txt and .gitignore and procfile in case you would like to deploy it yourself to heroku or to any other server. This code is obviously easily adaptable to other d3 visualizations that you like!

Many thanks to Mike Bostock for creating a wonderful language like d3 and to Andrew Heekin for writing the code that generates layered jsons. Thank you to David Bohl and Aditya Kulkarni for their feedback and comments.

I attach below links to the github repository and other sources below for your reference and convenience.

1. Link to github project- https://github.com/kanishkan91/FAO-FBS-Data-Explorer

2. Link to application deployed on heroku server- https://faoexplorer-flask-d3.herokuapp.com/

3. Link to Mike Bostocks collapsible bar chart example- https://observablehq.com/@d3/hierarchical-bar-chart

4. Link to Andrew Heekin’s code for creating layered json- https://github.com/andrewheekin/csv2flare.json/blob/master/csv2flare.json.py

--

--