Patchwork — The Next Generation of ggplots

Extending the versatility of ggplot2 even further..

Balaji Sundararaman
Towards Data Science

--

Photo by Nicolas Prieto on Unsplash

For Data Visualization in R, ggplot2 has been the go-to package to generate awesome, publishing quality plots. Its layered approach enables us to start with a simple visual foundation and keep adding embellishments with each layer. Even the most basic plots with default settings, looks and feels way better than base R graphics. So much so, that if you plan or need to share the output of your analytic work to internal stakeholders or external clients, it is imperative that you have ggplot in your skillset. Yes, there is a learning curve (which self-respecting skill does not !) but once mastered, ggplot2 provides superb possibilities for customization and to convey your insight in a striking manner.

But this post is not about ggplot per se ! Rather it is about the awesome layout functionality now available, beyond the standard grid options available through the facet_wrap() and facet_grid() methods of ggplot2.

Flexibile layouting options are especially useful and sorely missed, when you would like to group a set of related plots together into a single graphic, but find that some plots would look better wider than the others and messes with your grid constraints of rows and columns and row/column widths. Or worse, there is a rash of legends floating all over the place.

Enter patchwork !

patchwork builds on the power of + which now not only enables us to add a layer to the base plot in ggplot2, but also enables us to control the placement and size of the subplots. All of this with just 3 “operators” :+ , () and /. Lets check it out !

Installation

Ensure you have already installed ggplot2 package. If you have not used ggplot before, head to this comprehensive tutorial here by Selva Prabhakaran.

library(ggplot2)
install.packages('patchwork') # first time installation
library('patchwork') # load the library in to your R environment

We will use the Texas Housing txhousing dataset from ggplot2 to demo the capabilities of patchwork. Lets quickly check the dimensions and first few rows of the dataset.

data = ggplot2::txhousing
dim(data)
head(data)

It has 9 columns and 8602 observations and contains info on city wise housing sales and inventory data for the US state of Texas for 2000–2015 for various months. You can use the help(txhousing) command on your R console to check for the detailed description of the columns. For the purpose of our demo let us consider the data only for the city of Dallas.

dallas = data[data$city == "Dallas",] # take a subset of data only for Dallas

We will now create some ggplots that will then be used in patchwork

avg.monthly.sales = dallas %>% group_by(year) %>% summarise(mean = mean(sales))mytheme = theme(panel.background = element_rect(fill = "powderblue")) # set a ggplot theme for repeated use laterp1 = ggplot(avg.monthly.sales, aes(year,mean)) +
geom_col(fill = "gold", color = "navyblue") +
ylab("Mean Monthly Sales (Nos)") +
mytheme + labs(title = "Dallas - 2000 to 2015")
p1
avg.monthly.vol = dallas %>% group_by(year) %>% summarise(mean = mean(volume))p2 = ggplot(avg.monthly.vol, aes(year,mean)) +
geom_col(fill = "gold", color = "navyblue") +
ylab("Mean Monthly volumes in USD") +
mytheme + labs(title = "Dallas - 2000 to 2015")
p2
dal_2014 = dallas %>% filter(year ==2014)p3 = ggplot(dal_2014, aes(month,listings)) +
geom_line(color = "navyblue") + mytheme +
scale_x_continuous(breaks = c(seq(1,12,1))) +
labs(title = "Dallas 2014: Property Listing Count")
p3
avg.monthly.inventory = dallas %>% group_by(year) %>% summarise(mean = mean(inventory))p4 = ggplot(avg.monthly.inventory, aes(year,mean)) +
geom_col(fill = "gold", color = "navyblue") +
ylab("Mean Inventory (No of months)") +
mytheme + labs(title = "Dallas - 2000 to 2015")
p4

Now, assume that we want to display 2 or more of these plots as a single graphic. Extensions like gridExtra do a competent job, but the options are restricted to square or rectangular grids with one subplot per cell. Here is where patchwork hits it right out of the park with virtually no layouting code required. Watch !!

p1 + p2

Thats it ! As simple as addition..Now for some division as well.

(p1 + p2) /p3

And stack it some more..!

Photo by amirali mirhashemian on Unsplash
p1/(p3+p4)/p2

Awesome right? The package also has option to consolidate all the identical legends of the subplots in the patchwork into a single legend and more features. I will post a separate article covering this.

For another handy tool for Exploratory Data Analysis (EDA) in R, click here to see my article on SmartEDA.

Thank you for reading and please leave your comments and feedback.

--

--

Passionate about Data Analytics, Visualization and Machine Learning with extensive experience across functions in India’s emerging Fintech vertical