The world’s leading publication for data science, AI, and ML professionals.

Interactive COVID19 report with RMarkdown, Plotly, Leaflet and Shiny

Learn how to make interactive reports that can be applied to COVID19 and beyond!

This post is part 3 of a series of 4 publications. Refer to part 1 for an overview of the series, part 2 for an explanation of the data sources and minor data cleaning, part 3 for the creation of the visualisations, building the report and the deploy the document into ShinyApps.io and part 4 for automatic data update, compilation and publishing of the report. [Project repo]

Each article in the series is self-contained, meaning that you don’t need to read the whole series to make the most out of it.


Table of Contents

· GoalRequirements · Data input · PlotsTreemap ChartsLine ChartsMap ChartsEnough plotting! · Making a Shiny RMarkdown ReportWriting the reportUploading your document to the CloudPublishing your app · Conclusion


Goal

In this tutorial, we will be learning how to make 7 kinds of charts to illustrate the progress of COVID19 across nations. We will then integrate these into an RMarkdown document and publish an interactive copy of it for free! – using RStudio premises.

Requirements

Experience using R, dplyr and ggplot is useful but not essential. Across all my writing/coding I try to make the scripts as human-readable as possible.


Data input

First of all, today we will be using two datasets:

  • Daily count of confirmed COVID cases and COVID deaths by country (which we will call COVID events from here on). To get an idea of what this dataset looks like find the truncated table below:
  • Cumulative count of confirmed COVID cases and COVID deaths by country. Again find a truncated version of this table below:

    Note: The tables above are truncated for visualisation purposes, the actual tables contain data from December, 31st 2019 onwards and for most countries/regions in the World.


Plots

I will walk you through the 7 charts we will be making. For every plot the process is the same:

  1. Check the input format for the plotting function
  2. Manipulate the input data to get it in the right format
  3. Make plot

There may be some overlap in the data wrangling across plots but that is ok. My aim here is to show how each plot can be made on its own.

Treemap Charts

Treemap charts work great to represent hierarchies and the magnitude of each "child" element in its "parent". Here we are plotting COVID events by country which can further be grouped by continent which can, in turn, be grouped as part of the world. Let’s see how to make the following plot with Plotly.

In our treemap, the size of each box will be proportional to the number of events at the latest date for each country, hence we will be using the cumulative table.

After going through the Plotly R Treemap documentation, two things become clear. First, we need all elements (countries, continents, world) in the treemap chart to have a corresponding value, as of now we only have data aggregated at the country level, we will need to aggregate this for each continent and to aggregate it globally. Second, we need a new column indicating the parent of each element: e.g. for Spain, Europe; for Europe, World, for World, nothing. Additionally, we will be calling each element labels as countries, continents and the "World" all need to be in the same column.

Below is the commented script to wrangle the data to the correct specifications:

Now that we have the data in the right format, making the plot is super easy:

The hoverinfo argument takes care of the data displayed when hovering over any particular element. The specification here is saying:

  • value show the value for the specific element (i.e. Number of deaths/cases).
  • percent parent : show what percentage of the parent value the current child value represents (e.g Number of deaths in Spains/Number of deaths in Europe).
  • percent root : show what percentage of the root value the current child value represents (e.g Number of deaths in Spains/Number of deaths in the World).

You can read more about further hovering options in the documentation section.

Finally, we wish to actually store these plots as RDS objects to load them when our Shiny app loads instead of having to compute them at runtime. For this we use the following code:

Note: You may find success simply saving the output of the plot_ly(...) function. When I first started working on this project I found that did not work. For me using the plotly_build() function fixed it.

Line Charts

Here we will be making classic line plots describing the number of events by country or continent over time. In this instance, we won’t be making plotly plots directly. Instead, we will be using ggplot2 and with use the ggplotly function to turn them into interactive plots. We will be making plots like the following:

First, let’s make our ggplot with the cumulative table:

If you’re not familiar with ggplot2, all these lines for a single plot may seem like a lot. While it may seem so, I believe this helps structure plots and also renders the code to be highly readable. Additionally, the group aesthetic (within the aes() function) is not often introduced in beginner ggplot tutorials. In our particular case, it is very important. If we were to not specify it, ggplot would think that the grouping variable for our line plot is continent and would make a very messy plot, see below for the comparison:

Left: Correct plot using group aesthetic. Right: Incorrect plot, not using group, **** ggplot2 thinks the grouping variable is continent
Left: Correct plot using group aesthetic. Right: Incorrect plot, not using group, **** ggplot2 thinks the grouping variable is continent

Ok now that we’ve made our ggplot, we can use ggplotly() to turn it into a plotly chart in one line. Moreover, given we have stored our ggplot as a variable we can modify it further, e.g. show the y-axis in log10 scale. Finally, we will pass the new plot through the plotly_build() function to be able to store it.

I made 6 other line plots from the cumulative dataset that you can see in the GitHub repository for this project (including curves for events by continent with and without log10 scale). The code for all these plots follows the same structure.

Again following the same principles and pretty much the same code I made another 6 plots like the one below from the daily dataset. The code for that is also available in the repository.

Map Charts

Believe it or not, I won’t be using the plotly map functions here. I have created maps with them before but I do not recommend using them. I’ve stumbled across a few bugs([1], [2]) and the plotly community has been very unresponsive. Hence here we will use Leaflet. Unlike Plotly, Leaflet is solely focused on maps, hence they make much much better tools to build maps.

Having been built by the RStudio team itself, the R Leaflet package has that R feeling to it and more especially a dplyr and ggplot feeling, where maps can be built by layers. Let’s jump to the code. I will present how to make a simple static map. Making one with a timeline is much more complicated hence I will devote a future stand-alone article to it.

First of all, we are going to need a GeoJSON file with the country boundaries. GeoJSON files are a standard used to store geographical boundaries (though many more exist such as shapefiles). I downloaded the countries GeoJSON file from datahub.io. Luckily country boundaries do not change much these days, hence we only need to download and process this file once. It is quite a beefy file for our usage (23MB), hence we will use a little trick to reduce its size. This will greatly improve the load time of our Shiny document (we are smart, we think ahead of time). The size of the file is so big because of all the details in the boundaries (e.g. the resolution of the coastlines). We can make use of the rmapshaper package and more broadly of the mapshaper tool to make our map more coarse and the file much smaller. Below is the script to do all of this from within R.

The world_geojson object contains two main elements of interest: data and polygons. At this stage, the data element contains only two columns, ISO code and the corresponding region name. The polygons element contains the necessary information to draw the polygons for each region. We are going to first merge the data element with the cumulative dataset. Next, we are going to manually remove the polygons for which no data is available in the cumulative dataset. This last step is necessary because the data and polygons are matched by position instead of by some sort of identifier, if we did not perform this last step, the US would appear in Europe, France in Africa or Asia and so forth.

Note: We are only plotting the latest available data for each region (lines 6–9)

Now, all we have to do is the plotting using leaflet . As I said previously, Leaflet in R supports being built in layers, using the magrittr pipe operator (%>%). First, we are going to define a suitable colour palette, I’ve chosen Blues which represent low values with a white-ish colour and uses more and more intense blues as values grow. Then we are going to initialize the main map with leaflet() just as we often do with ggplot() and add tiles, which is Leaflet jargon for the formatting of the map (colour of the sea, of the land… – explore different tiles in this demo).

Now, we are finally going to have a look at the data on top of the map. We are going to add in the polygons and colour them proportionately to the number of cases in each country as per the end of October.

Note, I’ve coloured the regions proportionately to their log10 number of cases as otherwise, the disparity is so big that the colour scale is useless. We have also used the ez_labels() function from the ezplot package, which turns large numbers into a readable format (e.g. 86453625 to 86.4M).

Finally, we are going to add a legend to our map and save the file so we can load it later.

After this, our map looks like the one below. It is also interactive, you can get more details by hovering over the map (though this does not work in Medium, it will work when we publish the document in the Shiny server).

Leaflet map with
Leaflet map with

Enough plotting!

Plotting is great but endless and also very context-dependent. Above I have shown how to make common and not so common plots to illustrate the COVID events over time but you could imagine countless more ways to do this!


Making a Shiny RMarkdown Report

RMarkdown documents (.Rmd) are super versatile files that allow you to write intuitive Markdown text and executable R code chunks, all in one place. They are similar to Jupyter Notebooks but are stored as plain text documents as opposed to JSON syntax. RMarkdown documents support a bunch of output formats including PDF, HTML, Word and beamer slides. They also natively support Latex and HTML which is powered by the document compiler pandoc. All RMarkdown documents start with a header like this one:

---
title: "Title of my document"
author: "<Name-of-Author"
date: "<Today's date>"
output: html_document
---

If you are working from RStudio you can create an RMarkdown template by going File > New File > RMarkdown…

Default RMarkdown template in RStudio
Default RMarkdown template in RStudio

You can visualise the rendered document by clicking the Knit option and additionally you can have several output formats for a single .Rmd document (such as html_document, pdf_document, word_document). My preferred output format is html_document , it allows for interaction with Plotly and Leaflet objects as well as searchable tables. One thing HTML does not support is an actual user interface where the user modifies something on the front end which triggers a change in the backend and results in an updated front-end.

In the modern web, this is supported with JavaScript but as an R programmer, you may not want or have time to learn it, hence why we are going to use Shiny! This article is not a beginner Shiny tutorial but it will hopefully motivate you to get serious with learning Shiny. You can find beginner’s material in RStudio’s site.

The magic with Shiny combined with RMarkdown is that you can simply add this line to your RMarkdown header and your document will work as a Shiny UI:

runtime: shiny

That’s it! Now let’s write our document.

Writing the report

We are going to first load all the plots we have previously made and then design the very basic UI.

Now that we have loaded the plots, we simply need to design the Shiny UI, which is composed of an input element and of rendering functions that react to the changes in the input.

An input block looks like this:

The first argument is the input ID which Shiny is going to use to refer to this input handler. Next is the label of this input handle (label) which will appear right above the handle in the document, then the possible choices (choices) and finally the default choice (selected). Other input handling functions exist, such as: fileInput(), dateInput(), sliderInput(), checkboxInput() and others.

Once, we got the input handling function ready, we just need a render*() function that handles the input, such as renderPlotly(), renderImage(), renderText() or others. As you can see below the layout and logic is super easy and intuitive:

I could stop here but I’ll give a last example to illustrate how a more complex logic can easily be handled with a few lines of code. If you are not interested, please skip to the last section where we finally deploy our Shiny document to the Cloud.

The line plots we made earlier offer a great example to build more complex logic. We can have a handle for the mode of the plots (Daily/cumulative), another for geography (Country/continent), another one for the event type (Deaths/Cases) and a final one to enable log10 scale when the continent geography is chosen:

Here we are going to take advantage of the fact that our plot names are standardised to make the logic more concise(see below).

Depending on the inputs, we are going to build a string corresponding to one of the plot names using the ifelse() function. The logic for the geography choice is even more special because if "Continent" is chosen we will create a checkbox to enable log10 scale. To do this we use what’s called a conditional panel as seen in the next code chunk, additionally, we are going to wrap all of our input handlers inside an inputPanel() which will make them look very good and tidy:

This result in a tidy panel as shown below:

Notice how the log10 scale check-box is only available when the geography is set to "Continent"
Notice how the log10 scale check-box is only available when the geography is set to "Continent"

And below, it’s the logic to fetch the right plot based on the user’s selection of inputs:

I invite you once again to visit the live document in the cloud, as that will be the best way to understand what the UI experience is like. You can also find the source code for the .Rmd document in GitHub.

Uploading your document to the Cloud

Setting up a shinyapps.io account

If you have made it this far, I hope you have gotten some useful knowledge out of this article. All that is left to do is to upload our document to the RStudio servers. If it’s the first time doing this you will need to create an account first.

Once that is set-up you need to add your Shiny credentials to your RStudio or R session. You can do this via RStudio or via the terminal:

Connect to Shiny via RStudio

  • Open to RStudio > Preferences on the top sidebar.
  • Go to Publishing, the panel should look like the one below
  • Then click on Connect… and click on ShinyApps.io. RStudio will guide you from there. You basically need to retrieve a token from your account in shinyapps.io and paste it in RStudio.

Connect to Shiny via the terminal

  • Log in to shinyapps.io
  • Click on Account > Tokens on the sidebar
  • Click on +Add Token, then Show
  • Finally, copy the code block that the shinyapps.io page is showing you which should look like:
rsconnect::setAccountInfo(name=<user-name>,
              token=<token>,
              secret=<SECRET>)

Publishing your app

From RStudio

1 – Click on the blue publishing button as shown.

2 – A window like this one should open, you can give you ShinyApp whatever name you want, here that will be COVIDEDA. On the left, you can see the files and directories that will be uploaded to the server.

3 – Click on Publish

From the terminal

You can also upload the app from the R console by using the following command:

From the moment your document (or app) is first published a folder called rsconnect/ will be created in your project directory. Every time you follow these steps again, you will be able to update your document! You can visit my report on this link: https://lucha6.shinyapps.io/COVIDEDA/

Conclusion

In this article, I have illustrated how to create an interactive report to explore COVID19 data. We have made some cool plots using ggplot , plotly and leaflet . We have also learnt how to deploy 100% interactive documents with the help of RMarkdown and Shiny. The next article will cover how to automate the process of updating the datasets, plots and deploy the document without us having to even turn on our computers! 😮🙃


Related Articles