Multi faceted data exploration in the browser using Leaflet and amCharts

Creating a data exploration dashboard in the browser loading records live from Google Sheets and crossing geo-spatial information with chronological data.

Sergio Marchesini
Towards Data Science

--

Demo: https://smarques.github.io/mufa-leaf/
(The data in the demo is randomised, and the real study has not been published yet.)

In these challenging CoVid times I was asked to work on a data visualisation concerning the area in Italy where I live.

The data consisted of about 20k records, geo-referenced and labeled by sub-sector, type of subject, data source, and the year the activity started.

Example:

subsector => SUBSEC1,
subject => SUBJ2,
year => 1985,
position => [45.375302, 11.727905]
source => Gov. Census

My goal was to make this data explorable in different ways:
- as marks and clusters on a geographical map
- as a navigable TreeMap
- through filters and a timeline selector

Ideally this visualisations should be applied to the whole set or to a selected subset of data, so that you can apply filters and see i.e. how the distribution was in a specific year, and maybe for only a specific sub-sector of a given province.

The idea is to load the data in the browser as a JSON structure and keep it in memory, loading subsets depending on the user interaction with the filters and feeding the subset data to each visualisation.

I will go through some of the process here, please refer to the GitHub repo for details of the implementation: https://github.com/smarques/mufa-leaf

Preparing the DATA

Since the data was still under revision and re-categorisation, I wanted to load it from a shared google sheet.
I also used google sheets to geocode the addresses using this nice google sheet macro.

Because people were still working on the google sheet I decided to do data checks and data cleaning directly when importing the data in my JS app.

To load data from a Google Sheet the best option I found is Papa Parse. It lets you parse CSV to JSON and back, load it from anywhere, handle big files, the whole deal, plus it has a funny name.

Let’s try it out.

First of all you need to perform two actions on your Google Sheet:

  • Publish it to the web: File -> publish to the web, copy the web address, it will look something like: https://docs.google.com/spreadsheets/d/e/<hash>/pubhtml
  • Share it: click on the share button up right, then click on get sharable link. This will turn sharing by link on and ANYONE with the link will be able to see the data.

The url you need to feed to Papa Parse (jeez the name is really funny to me) is https://docs.google.com/spreadsheets/d/e/<hash>/pub?output=csv

You could also export your data as JSON directly from google sheets but our friend Papa gives us a lot more options for type casting, data cleaning, etc.

When you are done we this you can just import your data with:

https://gist.github.com/smarques/3dbfaaae4b3d8a204d2ac280f488528e

The processData function is in charge of cleaning up and refining the data we received from Google Sheets. Sometime you need to apply some regex, remove weird chars, fix inputing errors.

Creating a clustered map

The goal for now is to get those guys on a map, possibly on three different layers and with different marker colours by layer. We also would like markers to have a different icon for each subject type. Since we have almost 30k markers it might be a good idea to setup some clustering.

My choice of map library is Leaflet. It has great code and documentation, it’s very flexible and it has a nice plugin ecosystem.

I added a few plugins, like:
Leaflet Full Screen Control because I just love having maps go full screen
Leaflet Easy Print: there is a few printing plugins, but this seemed to work best for me
Leaflet Providers: to easily mix and match tiles from different sources, I must mention the Stamen Watercolor Tiles because they are really a joy for the eyes.
Leaflet Awesome Markers lets you use Font Awesome icons on your map markers

Now the way you would normally go if you just have to display and eventually filter markers on a leaflet map is to add the markers to their corresponding layers, then call functions on the map to hide or show whole layers or specific markers. My idea is a little different, I want to keep all the data at the application level so I can use it not only for the map but for other graphs and widgets as well. When we apply a filter I want the data to be filtered at application level and the map to be subsequently redrawn, along with any other data-dependent widget in the application.

The question at this point is: will the map and graphs redraw fast enough to make for a nice user experience and data exploration?
Let’s find out. (Spoiler: yayyy they do)

So we have a variable in the application called currentData that holds the currently displayed subset of the full recordset. Every time the user changes the selection filters we create a new data selection starting from App.data, save it into App.currentData and trigger a redraw on every widget.

updateCurrentData(App.data);

Even if we are just initialising the application after cleaning the data we can call the update function to fill the application currentData, since the user has not had a chance to filter the recordset, the function will just select every record and we are ready to go.

Let’s see what to do with the map widget for now.

Without clustering the map would be a little crowded: not very useful and also very heavy on your browser.

So let’s add a level of clustering using Leaflet.MarkerCluster.

I structured the markers according the sub-sector they belong to, there is only 4 possible values here, so having each sub-sector on a different layer would make it very easy to switch each group on and off. Also that will make it possible to have separate clusters for each sub-sector, making the display more interesting and giving better clues to the data distribution.

MarkerCluster allows you to customise the function that displays the cluster markers (hence the name..) so I have slightly different markers depending on the size of the cluster. (See for example the 9 vs the 325 red clusters).

For each of the four sector we instantiate a marker cluster group, giving an iconCreateFunction as discussed. I also added a mouseover function ti the cluster icons so you can float up a layer (changing its z-index) by hovering the mouse on it.

I also wanted to highlight the area the study refers to. Leaflet lets you load GeoJSON so all I had to do was search GitHub for a GeoJson with the region’s shape and load it on its own layer. ( Thanks to Stefano Cudini )

L.geoJson(venetoData).addTo(App.mainMap);

Filters!

I want all filters to be integrated in the map so I am adding a side panel on the right side of the map that lets you open different panels with filter controls.

I created filters for whole sub-sectors or just individual subjects within a sector. Then filters for each province

and for each data source

I won’t go into the implementation details for all the filter gui, you can look it up on the GitHub repo, basically to avoid a call-back hell every gui element is connected to an event stream using bacon.js, a reactive functional library. This way any change in any filter will result in the same update function being called, passing in the full set of filter values so that we can apply al the needed conditions to the full data set and load a new currentData subset for each widget to display.

The update function

Since I want to keep the data filtering at the main application level, every time I get a change in the filters I call an update function to extract the relevant data from the full set.
I use a js library called alasql to process the 15k records array according to the filters that the user selected. Among other things, alasql lets you run SQL queries on any array of objects. That makes it so much easier to process your data and makes your code readable and easy to debug and maintain.

I can run queries like:

var res = alasql(`SELECT MIN(${App.field_names.year}) AS minYear, MAX(${App.field_names.year}) AS maxYear FROM ? WHERE ${App.field_names.year} > 1900`, [App.currentData]);

As this point all I have to do in the update function is building an array of conditions to add to my SELECT statement.

It turns out MarkerCluster is very fast at updating if you just remove all layer markers and just re-add them, so when I have a new currentData object I just run through it and reassign it.

Timeline Selector

Next we want to be able to select records based on a year interval.

Leaflet controls are just html elements so I used Ion.RangeSlider and added its value to the reactive stream that triggers a new query and subsequent redraw.
When I first receive the data from Google Sheet I just get the min and max values for the year column so I can use it to display a reasonable range for the time selector.

TreeMap Graph

As a last element we add a TreeMap graph so we can visualise the structure of the selected subset. I included the amCharts library and setup a call in my update function to redraw the chart. I chose to start the break down by province.

You can then drill down in the visualisation clicking on the TreeMap blocks.

Wrapping it up

So now we have a main application level that is concerned with loading the full data, making sub-selections based on different filters and user behaviour and then triggering updates on different graphs and widgets.
It’s a good framework for letting the user explore the data in the browser, and it would be easy to add more visualisations. The whole point is that they all refer to the same data subset.
It’s very useful for example to see how locations and data structure change when you change the year parameter. You could add a play button and create an animation that shows changes year by year.
One more improvement would be to sync the data on local storage via IndexedDb (alasql supports it) so that you have a full offline experience.

If you feel like you want to add elements or experiment with it please feel free to fork it or drop me a pull request on GitHub!

--

--