The world’s leading publication for data science, AI, and ML professionals.

Create Beautiful Plots Easily with these R Packages

Produce elegant and specialized charts with little effort

Image by author.
Image by author.

Table of contents

  1. Introduction
  2. Packages 2.1 ggmap 2.2 ggpubr 2.3 patchwork 2.4 ggforce

  3. Conclusion
  4. References

1. Introduction

ggplot2¹ is a powerful R package for Data Visualization.

Following The Grammar of Graphics², it defines a plot as a mapping between data and:

  • Aesthetics: attributes such as color or size.
  • Geometry: objects like lines or bars.

The package implements these independent components in a modular approach allowing to create almost any plot in an elegant and flexible manner.

In this post, we explore four R packages that simplify the generation of advanced plots based on ggplot2. In particular:

  • ggmap adds geographical context to spatial graphs.
  • ggpubr generates charts for publication.
  • patchwork combines separate plots into the same graphics.
  • ggforce provides methods to customize ggplot2 charts and generate more specialized visualizations.

We use the California Housing Prices dataset, available on Kaggle³ under free license (CC0⁴) . It contains median house prices for California districts derived from the 1990 census.

We start by importing the needed packages:

library(tidyverse)
library(ggmap)
library(ggforce)
library(ggpubr)
library(patchwork)
library(concaveman) # needed for hull geometry

They can be installed by typing install.packages("package_name") in the R console.

tidyverse contains different packages. Among them, we will use:

  • dplyr and tibble for data manipulation.
  • readr to load the csv file.
  • ggplot2 for visualization.

Let us import the dataset and apply a simple binning on the price and income variables:

df <- 
  read_delim(file      = "C:/data/housing.csv",
    col_types = "dddddddddf",
    delim     = ",")
# Create bins for income and house price
df <- df %>% 
  mutate(
    price_bin  = as.factor(ntile(median_house_value, 4)),
    income_bin = ntile(median_income, 4))

2. Packages

2.1 ggmap

Authors: David Kahle, Hadley Wickham

When a dataset contains spatial information, it is interesting to observe variables in their geographic context.

We start by representing latitude and longitude using ggplot2 and catch a glimpse of California’s shape:

df %>%
  ggplot(aes(x = longitude, y = latitude)) +
  geom_point()
Image by author.
Image by author.

ggmap⁵ uses the same layered grammar of ggplot2 to add geographical context to plots. It allows to easily include static maps, for example from Google Maps, OpenStreetMap or Stamen Maps.

We simply need to pass Latitude and Longitude information to the qmplot⁶ function as follows

df %>%
  qmplot(x    = longitude, 
         y    = latitude, 
         data = .)
Image by author.
Image by author.

We can apply a color gradient based on the houses prices:

df %>% 
  qmplot(x      = longitude, 
         y      = latitude, 
         data   = ., 
         colour = median_house_value, 
         legend = "topright") +
  scale_color_gradient("House Prices", 
                       low  = "blue",
                       high = "red")
Image by author.
Image by author.

We may filter our coordinates to explore specific areas. For example, we can observe house prices around Sacramento (Lat: 38.575764, Lon: -121.478851):

df %>% 
  filter(between(longitude, -121.6, -121.3), 
         between(latitude, 38.4, 38.7)) %>%
  qmplot(x      = longitude, 
         y      = latitude, 
         data   = ., 
         colour = median_house_value, 
         size   = I(7), 
         alpha  = I(0.8), 
         legend = "topright") +
  scale_color_gradient("House Prices", 
                       low  = "blue", 
                       high = "red")
Image by author.
Image by author.

More information and examples can be found here⁷.

2.2 ggpubr

Author: Alboukadel Kassambara

The customization and formatting of base ggplots requires a deeper knowledge of the syntax and more advanced R skills.

The ggpubr⁸ package facilitates the creation of beautiful plots. It provides easy-to-use functions to generate publication-ready plots for researchers and R practitioners. In brief, it is a "wrapper" around ggplot2 that handles most of the complexity of plots customization.

For example, we can produce a well formatted boxplot with one line of code:

df %>%
 ggboxplot(x       = "income_bin", 
           y       = "median_house_value",
           fill    = "income_bin",
           palette = c("#2e00fa", "#a000bc", "#ca0086", "#e40058"))
Image by author.
Image by author.

The package allows to add p-values and significance levels to charts. In this example, we create a violin plot using ggviolin⁹ and add mean comparisons with stat_compare_means¹⁰:

# Comparison between the Income groups
bin_comparisons <- list( c("1", "2"), 
                         c("2", "3"), 
                         c("3", "4"),
                         c("1", "3"),
                         c("2", "4"),
                         c("1", "4"))
df %>%
 ggviolin(x          = "income_bin", 
          y          = "median_house_value", 
          title      = "Violin Plot",
          xlab       = "Income Levels",
          ylab       = "House Prices",
          fill       = "income_bin",
          alpha      = 0.8,
          palette    = c("#2e00fa","#a000bc","#ca0086","#e40058"),
          add        = "boxplot", 
          add.params = list(fill = "white")) +
  stat_compare_means(comparisons = bin_comparisons, 
                     label       = "p.signif") +
  stat_compare_means(label.y = 9000)
Violin plots are similar to box plots, except that they also show the kernel probability density of the data. Image by author.
Violin plots are similar to box plots, except that they also show the kernel probability density of the data. Image by author.

ggpubr provides several functions to easily generate well formatted plots. A density plot and histogram follow as example:

df %>%
  ggdensity(x       = "median_income",
            add     = "mean", 
            rug     = TRUE,
            color   = "price_bin", 
            fill    = "price_bin",
            title   = "Density plot",
            xlab    = "Income",
            ylab    = "Density",
            palette = c("#2e00fa", "#a000bc", "#ca0086", "#e40058")
   )
Image by author.
Image by author.
df %>%
  gghistogram(x       = "median_income",
              add     = "mean", 
              rug     = TRUE,
              color   = "price_bin", 
              fill    = "price_bin",
              title   = "Histogram plot",
              xlab    = "Income",
              ylab    = "Count",
              palette = c("#2e00fa","#a000bc","#ca0086","#e40058")
   )
Image by author.
Image by author.

We can even mix multiple plots on a single page, and even create a unique legend through the ggarrange function:

p1 <- df %>%
  ggdensity(x       = "median_income", 
            add     = "mean", 
            rug     = TRUE,
            color   = "price_bin", 
            fill    = "price_bin",
            title   = "Density plot",
            xlab    = "Income",
            ylab    = "Density",
            palette =  c("#2e00fa","#a000bc","#ca0086","#e40058")
   )
p2 <- df %>%
  gghistogram(x       = "median_income",
              add     = "mean", 
              rug     = TRUE,
              color   = "price_bin", 
              fill    = "price_bin",
              title   = "Histogram plot",
              xlab    = "Income",
              ylab    = "Count",
              palette = c("#2e00fa","#a000bc","#ca0086","#e40058")
   )
ggarrange(p1, 
          p2, 
          labels = c("A", "B"),
          ncol   = 2
   )
Image by author.
Image by author.

The official documentation¹¹ contains further information and more examples.

2.3 patchwork

Author: Thomas Lin Pedersen

patchwork¹² allows to easily combine separate ggplot2 visualizations into the same graphic. It also provides mathematical operators to manipulate plots more intuitively.

Similarly with the previous ggarrange example, we can mix two plots on the same graphic as:

p1 + p2
Image by author.
Image by author.

The greatest strength of the package lies in the perfect blend of intuitive and simple APIs and the capability to create arbitrarily complex compositions of plots. Some examples follow:

Image by author.
Image by author.

Further information and more examples can be found here¹³.

2.4 ggforce

Author: Thomas Lin Pedersen

ggforce¹³ extends ggplot2 to provide facilities for composing specialized plots.

Before sharing some examples, we create a new column that approximately identifies some renowned locations and their surrounding areas:

  • Los Angeles (Lat: 34.052235, Lon: -118.243683)
  • San Jose (Lat: 37.335480, Lon: -121.893028)
  • San Francisco (Lat: 37.773972, Lon: -122.431297)
  • Sacramento (Lat: 38.575764, Lon: -121.478851)
  • San Diego (Lat: 32.715736, -117.161087)
  • Fresno (Lat: 36.746841, Lon: -119.772591)
  • Stockton (Lat: 37.961632, Lon: -121.275604)
df <- df %>% 
  mutate(area = case_when(
    between(longitude, -118.54, -117.94) 
     &amp; between(latitude, 33.75, 34.35) ~ 'Los Angeles',
    between(longitude, -122.19, -121.59) 
     &amp; between(latitude, 37.03, 37.63) ~ 'San Jose',
    between(longitude, -122.73, -122.13) 
     &amp; between(latitude, 37.47, 38.07) ~ 'San Francisco',
    between(longitude, -121.77, -121.17) 
     &amp; between(latitude, 38.27, 38.87) ~ 'Sacramento',
    between(longitude, -117.46, -116.86) 
     &amp; between(latitude, 32.41, 33.01) ~ 'San Diego',
    between(longitude, -120.07, -119.47) 
     &amp; between(latitude, 36.44, 37.04) ~ 'Fresno',
    between(longitude, -121.57, -120.97) 
     &amp; between(latitude, 37.66, 38.26) ~ 'Stockton',
    TRUE ~ 'Other areas'
    )
  ) %>%
  mutate_if(is.character,as.factor)

We observe the outcome:

df %>%
  filter(area != "Other areas") %>%
  ggplot(aes( x = longitude, y = latitude)) +
  geom_point()
Image by author.
Image by author.

ggforce provides different functions to highlight sets of data. One may draw an outline around data groups with different shapes:

  • circles: geom_mark_circle()
  • ellipses: geom_mark_ellipse()
  • rectangles: geom_mark_rect()
  • hulls (convex closures): geom_mark_hull()

We can try drawing a rectangle around the geographical areas:

df %>%
  filter(area != "Other areas") %>%
  ggplot(aes(x = longitude, y = latitude, color = area)) +
  geom_mark_rect(aes(fill = area), concavity=10) +
  geom_point()
Image by author.
Image by author.

We can try to draw more complex polygons with geom_mark_hull(). It is possible to adjust the concavity of the resulting hulls through the concavity parameter:

df %>%
  filter(area != "Other areas") %>%
  ggplot(aes(x = longitude, y = latitude, color = area)) +
  geom_mark_hull(aes(fill = area), concavity=10) +
  geom_point()
Image by author.
Image by author.

It is possible to add labels to the groups on the plot:

df %>%
  filter(area != "Other areas") %>%
  ggplot(aes(x = longitude, y = latitude, color = area)) +
  geom_mark_hull(aes(fill = area, label = area), concavity = 10) +
  geom_point()
Image by author.
Image by author.

We can also combine the plots with the geographical information from ggmap as follows:

df %>%
  filter(area != "Other areas") %>%
  qmplot(x = longitude, y = latitude, data = .) +
  geom_mark_hull(aes(fill  = area, label = area), concavity = 10) +
  geom_point()
Image by author.
Image by author.

Additional information and further examples can be found here¹⁵.

3. Conclusions

R is a powerful tool for data analysis and visualization.

In this post, our goal was not to share a complete data analysis work or a solution to the Machine Learning problem. Rather, we wanted to explore some packages that simplify the task of creating beautiful visualizations.

The referenced documentation provides further examples and information.

For more insights on the Grammar of Graphics and the concepts behind ggplot2, we recommend Hadley Wickham’s "ggplot2 – Elegant Graphics for Data Analysis"¹.

4. References

[1] Hadley Wickham, "ggplot2 – Elegant Graphics for Data Analysis", Springer, 2009 (public link).

[2] Leland Wilkinson, "The Grammar of Graphics", Springer, 2005.

[3] https://www.kaggle.com/datasets/camnugent/california-housing-prices

[4] https://creativecommons.org/publicdomain/zero/1.0/

[5] https://cran.r-project.org/package=ggmap

[6] https://www.rdocumentation.org/packages/ggmap/versions/3.0.0/topics/qmplot

[7] David Kahle and Hadley Wickham , "ggmap: Spatial Visualization with ggplot2", The R Journal Vol. 5/1, June 2013, link.

[8] https://cran.r-project.org/package=ggpubr

[9] https://www.rdocumentation.org/packages/ggpubr/versions/0.4.0/topics/ggviolin

[10] https://www.rdocumentation.org/packages/ggpubr/versions/0.4.0/topics/stat_compare_means

[11] https://rpkgs.datanovia.com/ggpubr/

[12] https://cran.r-project.org/package=patchwork

[13] https://patchwork.data-imaginist.com/articles/guides/assembly.html

[14] https://cran.r-project.org/package=ggforce

[15] https://ggforce.data-imaginist.com/


Related Articles