Since last year’s release of dplyr 1.0.0 I’ve really enjoyed experimenting with what is possible with this seminal R package, and lately I’ve been considering how dplyr can be used to run any sort of function against inputs from a data frame. This means that you can use dplyr to perform as many actions as there are rows in your dataframe in a single (often very simple) command.
In this article I will show you how to use this concept to do the following using a single piped command in dplyr:
- Run many different models to assess which one has the best fit
- Create a batch of charts
- Write an arbitrary number of csv files
- Write an excel file with an arbitrary number of worksheets
- Generate a batch of pdf reports
I want to emphasize that these are just examples of a broader idea – that you can integrate dplyr with other packages and functions to perform a much broader array of tasks than you might have originally thought possible. Hopefully you can use the examples here to think up some other uses that might make your life easier in the future.
0. Before we start
In this article we are going to use a couple of key functions that are either new to dplyr 1.0.0+ or substantially improved. The first is rowwise()
. This is an important function for performing actions across individual rows of a dataframe. You can think about it as grouping by single rows.
Let’s take a look at an mtcars
example to illustrate what I mean. If we wanted to take the minimum value of gear
and carb
for each observation in mtcars
you might naturally try:
library(dplyr)
mtcars %>%
dplyr::mutate(min = min(gear, carb))
However you get this, which is not what you intended probably:

This is because by default dplyr works column-wise, and so your code is calculating the minimum value to be found in the entire two columns gear
and carb
. To work across rows, you need to use rowwise()
:
mtcars %>%
dplyr::rowwise() %>%
dplyr::mutate(min = min(gear, carb))
Now we see that the data has been grouped by row before returning our minimum, which is what we actually intended.

The second function I want to highlight is nest_by()
. This is a shortcut function that creates rows of nested data. For example:
mtcars %>%
dplyr::nest_by(cyl)
You’ll see that this creates a list column called data
with three subsets of mtcars
according to the three different values of cyl
:

You’ll see how useful these two function are as we now move into our five examples.
1. Run many different models to test for fit
The broom
package in R contains some great functions for producing tidy model output. For example, the glance()
function allows you to easily see the overall Statistics of a model in a tidy table. For example, this code:
library(broom)
broom::glance(
lm("mpg ~ cyl + gear", mtcars)
)
will produce the following row of statistic for the model specified.

So, if we want to test the fit of a whole bunch of model combinations, all we need to do is write a column with the model combinations we are interested in and do a rowwise mutate of broom to get all their fit statistics in a single piped command:
# create a column with model formulas to test
models <- data.frame(
formula = c(
"mpg ~ cyl",
"mpg ~ cyl + disp",
"mpg ~ cyl + disp + hp",
"mpg ~ cyl + disp + hp + drat"
)
)
# run them all and get fit statistics
models %>%
dplyr::rowwise() %>%
dplyr::mutate(
broom::glance(lm(formula, mtcars))
)
Et voila, all your models fit statistics:

2. Create a batch of charts
You might have noticed the list column concept earlier. List columns allows us to put any kinds of objects into the columns that we mutate or summarise using dplyr. Let’s say that we want to generate a chart that shows a scatter plot of mpg
vs wt
for different values of cyl
. We can use our nested data to mutate each chart into a new column:
library(ggplot2)
scatters <- mtcars %>%
dplyr::nest_by(cyl) %>%
dplyr::mutate(
charts = list(
ggplot(data, aes(x = wt, y = mpg)) +
geom_point()
)
)
This gives us the following results:

You can see that the ggplot2
objects are stored in our new charts
column. If you want to see one you can simply call it using:
scatters$charts[[1]]

3. Write a batch of csv files
You might be able to see where I am going now. Let’s say that we wanted to write our different nested dataframes to different csv files. Again we can mutate a function which writes each nested data frame into an appropriately named csv file:
mtcars %>%
dplyr::nest_by(cyl) %>%
dplyr::mutate(
write.csv(data, paste0("cyl", cyl, ".csv"))
)
This will produce the following unremarkable output:

But have a look in your project folder and you’ll find your three csv files right there waiting for you.
4. Write an Excel file with an arbitrary number of worksheets
Extending what we are doing above let’s write a function that can put the three nested dataframes into different worksheets of an Excel workbook using the openxlsx
package:
library(openxlsx)
write_worksheet <- function(wb, sheet, data) {
addWorksheet(wb, sheetName = sheet)
openxlsx::writeData(wb, sheet, data)
}
Now we can start a workbook, write everything into different worksheets using dplyr
and then save it.
wb <- createWorkbook()
mtcars %>%
dplyr::nest_by(cyl) %>%
dplyr::mutate(
write_worksheet(wb, paste("cyl", cyl), data)
)
saveWorkbook(wb, "test.xlsx")
While your terminal output of this command will be unremarkable, have a look for your newly writtentest.xlsx
file, open it, and you should see this:

5. Generate a batch of pdf reports
Let’s put a few things together now and use rmarkdown
to batch produce a bunch of pdf reports usingdplyr
. First, we can save a template.Rmd
file that expects to receive one of our nested dataframes (data
)and a value for cyl
, calculate a few averages and display the data and our ggplot charts in a basic report. So here is the contents of our template.Rmd
file:
---
title: "Report on your data"
author: "Keith McNulty"
date: "15/03/2021"
output: pdf_document
---
```{r setup, echo = FALSE}
library(ggplot2)
knitr::opts_chunk$set(echo = FALSE)
Here is the data for the cyl
value r cyl
.
Your averages
Your average mpg
value is r mean(data$mpg)
.
Your average wt
value is r mean(data$wt)
.
Your chart
Here is your scatter plot:
ggplot(data, aes(x = wt, y = mpg)) +
geom_point()
Your raw data
Here is your raw data:
knitr::kable(data)
Now we write a simple function to render our R Markdown report for a specific value of `cyl` :
library(rmarkdown)
function to write a pdf based on an Rmd template
write_markdown <- function(data, cyl_value) {
data <- data
cyl <- cyl_value
rmarkdown::render("template.Rmd",
output_file = paste0(cyl, "_report.pdf"))
}
And then we let `dplyr` do its magic again:
mtcars %>% dplyr::nest_by(cyl) %>% dplyr::mutate(write_markdown(data, cyl))
You'll see some magic happening and then you will have three pdf reports sitting in your project directory, looking like this:

These are obviously general examples using a simplistic data set, but the aim here is to demonstrate what is clearly a very wide range of possibilities for how you can integrate `dplyr` with other functions in R to make it easy to perform a large batch of tasks in a single simple command. I really encourage you to explore and play around with this and if you find other fun uses for this please do let everyone know by adding a comment to this article.
---
_Originally I was a Pure Mathematician, then I became a Psychometrician and a Data Scientist. I am passionate about applying the rigor of all those disciplines to complex people questions. I'm also a coding geek and a massive fan of Japanese RPGs. Find me on [LinkedIn](https://www.linkedin.com/in/keith-mcnulty/) or on [Twitter](https://twitter.com/dr_keithmcnulty). Also check out my blog on [drkeithmcnulty.com](http://drkeithmcnulty.com/)._
