The world’s leading publication for data science, AI, and ML professionals.

Watch the rise of plastics in our oceans – Part one

In this series of articles, learn how to build a web app in R Shiny using just-released data on ocean plastics

Learn R Shiny through studying the rise of ocean plastics— Part one

In a very recent article in the journal Nature, a team of marine scientists analyzed data that had been captured in devices called Continuous Plankton Recorders (CPRs) since the 1930s. CPRs are towed by vessels during their routine journeys and are designed to capture plankton for research purposes.

But beginning in 1957, CPRs started to accidentally capture another piece of data – one which we didn’t know we would find useful until recently – because in that year, a CPR was found to be contaminated with a man-made plastic object for the first time.

For their journal article, the team of scientists painstakingly put together a dataset of all such incidents of man-made plastics being found in CPRs in the North Atlantic, Arctic and Northern European seas from the first recorded incident in 1957 through to 2016. They also gave the public access to that dataset – a total of 208 incidents of various types of plastic ranging from fishing nets to plastic bags.

As I looked at the dataset here, I realized that it is a great dataset to help teach someone how to create a data exploration app using R Shiny, and this is what I will do in this series of articles. The dataset is limited in size so easy to work with, but contains a wide range of information including dates, text, and geographic co-ordinate data.

To follow this series, you need some basic experience in data manipulation in R, including using RStudio as a coding environment. Some previous experience using RMarkdown, Shiny, and simple plotting in ggplot2 is helpful but not essential. You will need to install the following packages before you start: shiny, RMarkdown, ggplot2, tidyverse and leaflet.

In this series you will learn how to:

  • Design a simple web dashboard format in R Markdown, and prepare your data for use in this dashboard.
  • Understand reactivity in shiny apps, and how inputs and outputs interact
  • Create simple charts that change according to user inputs
  • Visualize co-ordinate data on maps
  • (Advanced): create animated time series graphics
  • Publish your app so that others can play with it

To see the app that will be the end product of this series you can visit [here](https://github.com/keithmcnulty/cpr_data) and the full set of code is on Github here. Here is one of the cooler outputs – a clip from an animated timeline of all instances of man-made plastics discovered in CPRs since 1957.

What is Shiny?

shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R.

One way of describing the value of shiny is as follows: imagine you are an analyst in a company and you have access to a confidential set of data that many people want statistics on. You get at least six requests a day for predictable statistics like grouped averages based on specified filters. One way of dealing with this is to just manually do these analyses yourself every time and send the results to the requestor.

Another is to build a simple self-serve app where requestors can visit any time, select their filters and other inputs, and see the results instantly. shinyallows exploration of a data set via the web, where the developer can decide what can be seen and how it can be seen. Using shiny, R programmers can save a lot of manual effort at the cost of just a little extra coding. (In fact, with every release, Shiny can do a lot more that just this, but let’s stick to this for now).

To use shiny, R programmers need to delve slightly into the world of reactivity. They need to understand how inputs feed into reactive variables in R and how to control that reactivity in order to display the output that they desire. If you don’t have time to learn and master Javascript, shiny is a great option to achieve the goal of publishing reactive Analytics via the web.

Designing your dashboard

How you design your dashboard depends heavily on how it is going to be used. If it is intended to be a product to be used by many people to fulfill a particular need, it’s advisable to follow the principles of good UX design and conduct user interviews to understand exactly what your users need before proceeding to mockups and technical development.

In this case, since this is a fun project and not for a massive user base, we can decide some of our own design. The first thing we need to do is look at the data and ask: what do we think people will be interested in seeing? Here is a snapshot of the CPR data we are dealing with:

We can see a few things that would be of interest. First, the ‘Observation’ column helps us identify what type of plastic was found, allowing us to add a ‘type’ to the data set (similar to how the scientists in the Nature article conducted their analysis), which may help for filtering. There is also a ‘Year of tow’ column, helping us understand the timeline. There are co-ordinates for the start and end of the tow, helping us visualize where the plastic was found, and finally there are names for different maritime regions, which may also be helpful for filtering.

Therefore, it might be useful to include the following in our app:

  1. The ability to filter by plastic type and maritime region for all analyses
  2. The ability to filter by year period for some analyses
  3. The ability to see some basic statistics on incidences of plastics in CPRs (potentially guided by the analysis done in the Nature article).
  4. The ability to visualize where and when these instances occurred on maps.

Therefore, we are going to design our dashboard as follows:

The GLOBAL SIDEBAR will always be visible. It will give some general use guidelines and context and it will allow filtering by plastic type and by maritime region, so that any results displayed in the app will reflect these filters. Then there will be three independent pages, each one focusing on a different aspect of the data:

  • STATS will present some descriptive statistics on the incidents, with some further filtering options
  • LOCATIONS will present the locations of incidents on a map of some form
  • TIME will show in some form how incidents occurred over time

Preparing the data for use in the app

With our design in mind, we now need to take the dataset given to us by the researchers and adapt it to be used in our app. There are two steps to this.

First, we need to add any new columns necessary for a particular analysis we have in mind in our design. As I mentioned above, this dataset is already in pretty good shape, but we do need to parse the Observation column to determine a ‘type’ for each incident, and we could do with cleaning up the column names to make them simpler to work with, as they will look pretty messy when loaded directly into R. So let’s start up a project in RStudio and call it cpr_data, and let’s create a subfolder in that project called called data, and in this we can place the original xlsx file the researchers provided us. We can write a simple R script to add the new ‘type’ column and to tidy up the column names – let’s call that prep_data.R. We can load the dataset in by either opening it in Excel, deleting the first row, resaving as a CSV file and use read.csv. Or, like I do below, you can read it into R directly using the openxlsx package.

# prep data for use in CPR app
# load libraries
library(dplyr)
library(openxlsx)
# prep data for use in CPR app
# load libraries
library(dplyr)
library(openxlsx)
# load original data file and make colnames easier to code with
data <- openxlsx::read.xlsx("data/Supplementary_Data_1.xlsx", sheet = 1, startRow = 2)
colnames(data) <- gsub("[.]", "", colnames(data)) %>% 
  tolower()
colnames(data)[grepl("region", colnames(data))] <- "region"
# create columns to classify by key term
data <- data %>% 
  dplyr::mutate(
    type = dplyr::case_when(
      grepl("net", observation, ignore.case = TRUE) ~ "Netting",
      grepl("line|twine|fishing", observation, ignore.case = TRUE) ~ "Line",
      grepl("rope", observation, ignore.case = TRUE) ~ "Rope",
      grepl("bag|plastic", observation, ignore.case = TRUE) ~ "Bag",
      grepl("monofilament", observation, ignore.case = TRUE) ~ "Monofilament",
      grepl("string|cord|tape|binding|fibre", observation, ignore.case = TRUE) ~ "String",
      1L == 1L ~ "Unclassified"
  )
)
# save as an RDS file
saveRDS(data, "data/data.RDS")

In this script we use grepl() to identify terms in the text strings in the Observation column, and then we use dplyr::case_when() to assign these terms to a type. In the event that no terms match, we define a type ~ "Unclassified" . We also change column names into simple lower case strings that are easy to code.

Second, we need to re-save this transformed data. Later, when we have written our app and we deploy it so that others can access it, this data file will be bundled with it, and the app will read it into its Environment. If this dataset was very large, we would need to think about the fastest file format that R could read from. But in this case the dataset is small, so we can choose any file format to save it as. In this case we will keep things simple and save the dataset as an R object into an RDS file.

Next time…

So we have got our basic design planned, and we have the data set up in the right way. In the next part of this series, I’ll go through how to get the simple outline of the dashboard up and running. I’ll also discuss how to handle inputs and reactive variables and how to build some basic descriptive plots in ggplot2that respond to user input.

Exercises

Here are some follow-up exercises which you can use to test how well you have absorbed the content of this article:

  1. What is R Shiny and why might it be useful for this dataset?
  2. How might you go about designing this dashboard if the user base was going to be large and varied?
  3. How does a local dataset get used in an R Shiny app? What are the key things to think about when you create a dataset to be used in an R Shiny app?
  4. Read the Nature article that gave rise to this dataset. What other ways might you design this dashboard having read this article?

_Originally I was a Pure Mathematician, then I became a Psychometrician and a Data Scientist. I am passionate about applying the rigor of all those disciplines to complex people questions. I’m also a coding geek and a massive fan of Japanese RPGs. Find me on LinkedIn or on Twitter._


Related Articles