The world’s leading publication for data science, AI, and ML professionals.

How to write a custom function to generate multiple plots in R

An easy introduction to writing custom functions

A visual of a pair of hands typing on a laptop with code on the screen. Image by Author
A visual of a pair of hands typing on a laptop with code on the screen. Image by Author

I always felt intimidated to write functions in R, since I was so comfortable using out-of-the box solutions that come with tidyverse. When I started coding in Python, I found myself writing more and more custom functions to replicate my favorite dplyr functions from R into Python.

Learning how to write functions in Python has made me a better programmer in R, too. It’s helped me automate my work and ensure reproducibility. The most common use case I have for custom functions is to generate multiple plots in R.

Reasons to write custom functions to generate plots in R:

1. Automate reporting: Sometimes you have to build a report with the same type of visual (like a bar chart) for different variables. For example, I formerly worked for the Mayor of Los Angeles, Eric Garcetti, during the height of the pandemic. Our data team was tasked with producing reports for his daily Covid press briefings that covered daily case, death, hospitalization, testing, and vaccination rates. We had the same chart format–bar chart– for each of these variables. Instead of repeating the graph code each time for a different variable, we automated the reporting process by writing a custom function to loop through the variables and produce the charts. This way, we just had to pull in the data each day and run the script of code to generate new graphs.

2. Stepping stone to building a dashboard: Once you have automated the reporting process, it’s a natural progression to turn your report into a dashboard for an interactive experience. R has a great dashboard library, Shiny, that makes it easy to build a web application directly in R language. When you already have a custom function for generating plots, you can easily use the same function in your dashboard code to allow for users to select the variable of interest.

3. Create a DIY facet wrap for Plotly: One of my favorite features of [ggplot2](https://ggplot2.tidyverse.org/) is using facet_wrap, where you can generate multiple subplots in one view. It’s a simple line of code in ggplot2. Unfortunately, there’s nothing like this for Plotly yet so I had to recreate it with a custom function.

I acknowledge that we can now use ggplotly()to create an interactive version of ggplot2 graph with facet_wrap. But I personally have found poor performance issues using ggplotly(), and choose to custom build everything directly in Plotly instead.


Here’s a breakdown of the logic for creating a custom function:

  1. Start with creating one visual first
  2. Understand which variable you want to use to create multiple plots
  3. Change the graphing code into a function
  4. Loop through your unique values to generate multiple plots

Let’s work with the adorable Palmer Penguins dataset from Allison Horst. This dataset has three unique species of penguins – Chinstrap, Gentoo, Adelie:

Artwork by @allison_horst
Artwork by @allison_horst

Here’s how to load the data

# Load libraries
library(palmerpenguins)
library(tidyverse)

# Load in data
data(package = 'palmerpenguins')
# Write penguins to a `df` variable.
# I'm doing this simply because it's easier to type `df` than `penguins` each time.
df <- penguins

1. Start with creating one visual first

Let’s create a bar plot for the Adelie species to see their median body mass for each year.

# Create a summary table to calculate the median body mass by species and year
summary <- df %>% 
  group_by(species, year) %>%
  summarise(median_body_mass = median(body_mass_g, na.rm =T))

# Create a Plotly bar chart for the median bass of the Adelie penguins
plot_ly(
  data= {summary %>% filter(species == "Adelie")},
  x = ~year,
  y = ~median_body_mass,
  color = ~year,
  type = "bar",
  showlegend = FALSE) %>%
  layout(
    yaxis = list(title = 'Median Body Mass (g)'),
    xaxis = list(title = 'Year',tickvals = list(2007, 2008, 2009)),
    title = "Median Body Mass for Adelie Penguins") %>%
  hide_colorbar() %>%
  suppressWarnings()
A bar chart of the median body mass for Adelie Penguins for the years 2007, 2008, and 2009.
A bar chart of the median body mass for Adelie Penguins for the years 2007, 2008, and 2009.

2. Understand which variable you want to use to create multiple plots

_aka: what’s your facet_wrap variable?_

Here’s the view of our summary table. We want to create the same bar graph for each species. In this example, our variable of interest is the species variable.

A view of our summary table that displays the median body mass for each penguin species - Adelie, Chinstrap, and Gentoo - for the years 2007, 2008, and 2009.
A view of our summary table that displays the median body mass for each penguin species – Adelie, Chinstrap, and Gentoo – for the years 2007, 2008, and 2009.

3. Change the graphing code into a function

Identify the components in your graphing code that need to be generalized. Now, we will swap out any instance of the species name Adelie with a generalized variable:

Description of our Plotly code that shows which variables we will want to generalize. In this example, we want to swap out any instance of the species name "Adelie" with a generalized variable so we can create the plot for each new species.
Description of our Plotly code that shows which variables we will want to generalize. In this example, we want to swap out any instance of the species name "Adelie" with a generalized variable so we can create the plot for each new species.

Transform the graphing code into a function. This function takes in one variable species_name which will be entered as string text. See how instead of the name Adelie here, we have replaced with the variable species_name:

plot_fx <- function(species_name){
  plot_ly(
    data= {summary %>% filter(species == species_name)},
    x = ~year,
    y = ~median_body_mass,
    color = ~year,
    type = "bar",
    showlegend = FALSE) %>%
    layout(
      yaxis = list(title = 'Median Body Mass (g)'),
      xaxis = list(title = 'Year',tickvals = list(2007, 2008, 2009)),
      title = paste("Median Body Mass for", species_name, "Penguins")) %>%
    hide_colorbar() %>%
    suppressWarnings()
  }

Here’s an example of how to run the function to generate your new plot. Let’s make the same bar chart for the species Chinstrap:

# Run function for species name "Chinstrap"
plot_fx("Chinstrap")
A bar chart of the median body mass for Chinstrap Penguins for the years 2007, 2008, and 2009. This was generated by the custom function we created in the post.
A bar chart of the median body mass for Chinstrap Penguins for the years 2007, 2008, and 2009. This was generated by the custom function we created in the post.

4. Loop through your unique values to generate multiple plots

From here, you need a list of all the unique species to loop through for your function. We get that with unique(summary$species)

Start with creating an empty list to store all your plots

# Create an empty list for all your plots
plot_list = list() 

Loop through the unique species variable to generate a plot for each species. Then, add it to the plot_list

# Run the plotting function for all the species
for (i in unique(summary$species)){
    plot_list[[i]] = plot_fx(i)
}

# Now you have a list of three plots - one for each species. 
# You can see the plots by changing the value within the square brackes from 1 to 3
plot_list[[1]]

Now visualize all the plots in one grid with the subplot function in Plotly:

# Plot all three visuals in one grid
subplot(plot_list, nrows = 3, shareX = TRUE, shareY = FALSE) 
Three bar charts of the median body mass for Adelie, Chinstrap, and Gentoo Penguins for the years 2007, 2008, and 2009. This was generated by looping through each unique species in our dataset for our custom graphing function.
Three bar charts of the median body mass for Adelie, Chinstrap, and Gentoo Penguins for the years 2007, 2008, and 2009. This was generated by looping through each unique species in our dataset for our custom graphing function.

We did it!

I know that’s a lot more work than using the facet_wrap function in ggplot2, but understanding how to create functions helps with automating reports and creating more dynamic dashboards and visuals!

Bonus Step! Adding Annotations to Get a Title for Each Plot

To get the titles on each of the subplot in the last visual, you have to use annotations in Plotly.

# Create a list of annotations
# The x value is where it lies on the entire subplot grid
# The y value is where it lies on the entire subplot grid 

my_annotations = list(
  list(
    x = 0.1, 
    y = 0.978, 
    font = list(size = 16), 
    text = unique(summary$species)[[1]], 
    xref = "paper", 
    yref = "paper", 
    xanchor = "center", 
    yanchor = "bottom", 
    showarrow = FALSE
  ), 
  list(
    x = 0.1, 
    y = 0.615, 
    font = list(size = 16), 
    text = unique(summary$species)[[2]], 
    xref = "paper", 
    yref = "paper", 
    xanchor = "center", 
    yanchor = "bottom", 
    showarrow = FALSE
  ), 
  list(
    x = 0.1, 
    y = 0.285, 
    font = list(size = 16), 
    text = unique(summary$species)[[3]], 
    xref = "paper", 
    yref = "paper", 
    xanchor = "center", 
    yanchor = "bottom", 
    showarrow = FALSE
  ))

This is kind of a messy, trial-and-error process, because you have to hard code in the positions. Here’s a breakdown for how to do so:

  1. Create a list of annotations for each subplot title: The annotations will be a list of lists. Each element is a list that includes all the information for each subplot. In our example, I want one title that displays the species name for each subplot, so I will have a list with 3 elements. Here’s what goes into each element:
Description of our annotations code that shows what the 'x' , 'y', and 'text' variables correspond to.
Description of our annotations code that shows what the ‘x’ , ‘y’, and ‘text’ variables correspond to.
  • x: This is a value between 0 and 1 and corresponds to the position for the entire graphic, with 0 at the left end and 1 at the right end.
  • y: This is a value between 0 and 1 and corresponds to the position for the entire graphic, with 0 at the bottom and 1 at the top.
  • text: This is the text you want to display for each of the subplot titles.
  • xref and yref: You have options to select ‘paper’ which means the position refers to the distance from the left of the plotting area in normalized coordinates where "0" ("1") corresponds to the left (right). Alternatively, you can select ‘domain’ that will correspond to the domain for each individual subplot.
  • xanchor: Sets the text box’s horizontal position anchor. This anchor binds the x position to the "left", "center" or "right" side of the annotation. Imagine where your point is based on your x and y coordinates, and how you want the text to align relative to the position.
Description on xanchor alignment for Plotly layout.
Description on xanchor alignment for Plotly layout.
  • yanchor: Sets the text box’s vertical position anchor. This anchor binds the y position to the "top", "middle" or "bottom" of the annotation. Imagine where your point is based on your x and y coordinates, and how you want the text to align relative to the position.
Description on yanchor alignment for Plotly layout.
Description on yanchor alignment for Plotly layout.
  • showarrow: Plotly can draw an arrow that points to the location of your annotation using TRUE or FALSE options . This is helpful if you want to label a specific point on a scatter plot. Since we are just adding text labels onto each subplot, the arrow is unnecessary in this example.

2. Add the layout option to your subplot code: You can add layout options with the layout() function.

# Run the subpot line including a layout
subplot(plot_list, nrows = 3, shareX = TRUE, shareY = FALSE) %>%
  layout(annotations = my_annotations,
         title = "Median Body Mass for Palmer Penguins",
         xaxis = list(tickvals = list(2007, 2008, 2009)),
         xaxis2 = list(tickvals = list(2007, 2008, 2009)),
         xaxis3 = list(tickvals = list(2007, 2008, 2009)))

Here are some options you can specify:

  • annotations: The list of annotations you created that include all the information for the text and position of each label
  • title: This is the text for the title of the entire grid
  • xaxis, xaxis2, xaxis3: In Plotly, each subplot has its own x axis properties. xaxis refers to the first subplot. In this example, the one for the Adelie penguin species. The remaining x axes can be referenced by numbering each one. Here I am specifying the label for the tick values so that we have standardized years.

Conclusion

While this is a simple example, I hope this helps open up more possibilities for improving your Data Science workflow by using custom functions! You can take the steps we took here and generalize it to writing custom functions overall by:

  • Starting with a simplified example
  • Swapping out your variable into a generalized variable
  • Applying the function to the rest of your data

Once you have the basics down, you can expand on this to ensure reproducibility of your work through automated reports, dashboards, and interactive visuals. Having this foundation also helps you become more proficient in both languages – R and Python – because you can reconstruct what works in one language into the other. In a world where R and Python are becoming increasingly more interchangeable, this offers possibilities that are not limited to a specific language!


All images unless otherwise noted are by the author.


Related Articles