
Table of contents
1. Introduction
ggplot2
¹ is a powerful R package for Data Visualization.
Following The Grammar of Graphics², it defines a plot as a mapping between data and:
- Aesthetics: attributes such as color or size.
- Geometry: objects like lines or bars.
The package implements these independent components in a modular approach allowing to create almost any plot in an elegant and flexible manner.
In this post, we explore four R packages that simplify the generation of advanced plots based on ggplot2
. In particular:
ggmap
adds geographical context to spatial graphs.ggpubr
generates charts for publication.patchwork
combines separate plots into the same graphics.ggforce
provides methods to customizeggplot2
charts and generate more specialized visualizations.
We use the California Housing Prices dataset, available on Kaggle³ under free license (CC0⁴) . It contains median house prices for California districts derived from the 1990 census.
We start by importing the needed packages:
library(tidyverse)
library(ggmap)
library(ggforce)
library(ggpubr)
library(patchwork)
library(concaveman) # needed for hull geometry
They can be installed by typing install.packages("package_name")
in the R console.
tidyverse
contains different packages. Among them, we will use:
dplyr
andtibble
for data manipulation.readr
to load the csv file.ggplot2
for visualization.
Let us import the dataset and apply a simple binning on the price and income variables:
df <-
read_delim(file = "C:/data/housing.csv",
col_types = "dddddddddf",
delim = ",")
# Create bins for income and house price
df <- df %>%
mutate(
price_bin = as.factor(ntile(median_house_value, 4)),
income_bin = ntile(median_income, 4))
2. Packages
2.1 ggmap
Authors: David Kahle, Hadley Wickham
When a dataset contains spatial information, it is interesting to observe variables in their geographic context.
We start by representing latitude
and longitude
using ggplot2
and catch a glimpse of California’s shape:
df %>%
ggplot(aes(x = longitude, y = latitude)) +
geom_point()

ggmap
⁵ uses the same layered grammar of ggplot2
to add geographical context to plots. It allows to easily include static maps, for example from Google Maps, OpenStreetMap or Stamen Maps.
We simply need to pass Latitude
and Longitude
information to the qmplot
⁶ function as follows
df %>%
qmplot(x = longitude,
y = latitude,
data = .)

We can apply a color gradient based on the houses prices:
df %>%
qmplot(x = longitude,
y = latitude,
data = .,
colour = median_house_value,
legend = "topright") +
scale_color_gradient("House Prices",
low = "blue",
high = "red")

We may filter our coordinates to explore specific areas. For example, we can observe house prices around Sacramento (Lat: 38.575764, Lon: -121.478851):
df %>%
filter(between(longitude, -121.6, -121.3),
between(latitude, 38.4, 38.7)) %>%
qmplot(x = longitude,
y = latitude,
data = .,
colour = median_house_value,
size = I(7),
alpha = I(0.8),
legend = "topright") +
scale_color_gradient("House Prices",
low = "blue",
high = "red")

More information and examples can be found here⁷.
2.2 ggpubr
Author: Alboukadel Kassambara
The customization and formatting of base ggplots requires a deeper knowledge of the syntax and more advanced R skills.
The ggpubr
⁸ package facilitates the creation of beautiful plots. It provides easy-to-use functions to generate publication-ready plots for researchers and R practitioners. In brief, it is a "wrapper" around ggplot2
that handles most of the complexity of plots customization.
For example, we can produce a well formatted boxplot with one line of code:
df %>%
ggboxplot(x = "income_bin",
y = "median_house_value",
fill = "income_bin",
palette = c("#2e00fa", "#a000bc", "#ca0086", "#e40058"))

The package allows to add p-values and significance levels to charts. In this example, we create a violin plot using ggviolin
⁹ and add mean comparisons with stat_compare_means
¹⁰:
# Comparison between the Income groups
bin_comparisons <- list( c("1", "2"),
c("2", "3"),
c("3", "4"),
c("1", "3"),
c("2", "4"),
c("1", "4"))
df %>%
ggviolin(x = "income_bin",
y = "median_house_value",
title = "Violin Plot",
xlab = "Income Levels",
ylab = "House Prices",
fill = "income_bin",
alpha = 0.8,
palette = c("#2e00fa","#a000bc","#ca0086","#e40058"),
add = "boxplot",
add.params = list(fill = "white")) +
stat_compare_means(comparisons = bin_comparisons,
label = "p.signif") +
stat_compare_means(label.y = 9000)

ggpubr
provides several functions to easily generate well formatted plots. A density plot and histogram follow as example:
df %>%
ggdensity(x = "median_income",
add = "mean",
rug = TRUE,
color = "price_bin",
fill = "price_bin",
title = "Density plot",
xlab = "Income",
ylab = "Density",
palette = c("#2e00fa", "#a000bc", "#ca0086", "#e40058")
)

df %>%
gghistogram(x = "median_income",
add = "mean",
rug = TRUE,
color = "price_bin",
fill = "price_bin",
title = "Histogram plot",
xlab = "Income",
ylab = "Count",
palette = c("#2e00fa","#a000bc","#ca0086","#e40058")
)

We can even mix multiple plots on a single page, and even create a unique legend through the ggarrange
function:
p1 <- df %>%
ggdensity(x = "median_income",
add = "mean",
rug = TRUE,
color = "price_bin",
fill = "price_bin",
title = "Density plot",
xlab = "Income",
ylab = "Density",
palette = c("#2e00fa","#a000bc","#ca0086","#e40058")
)
p2 <- df %>%
gghistogram(x = "median_income",
add = "mean",
rug = TRUE,
color = "price_bin",
fill = "price_bin",
title = "Histogram plot",
xlab = "Income",
ylab = "Count",
palette = c("#2e00fa","#a000bc","#ca0086","#e40058")
)
ggarrange(p1,
p2,
labels = c("A", "B"),
ncol = 2
)

The official documentation¹¹ contains further information and more examples.
2.3 patchwork
Author: Thomas Lin Pedersen
patchwork
¹² allows to easily combine separate ggplot2
visualizations into the same graphic. It also provides mathematical operators to manipulate plots more intuitively.
Similarly with the previous ggarrange
example, we can mix two plots on the same graphic as:
p1 + p2

The greatest strength of the package lies in the perfect blend of intuitive and simple APIs and the capability to create arbitrarily complex compositions of plots. Some examples follow:

Further information and more examples can be found here¹³.
2.4 ggforce
Author: Thomas Lin Pedersen
ggforce
¹³ extends ggplot2
to provide facilities for composing specialized plots.
Before sharing some examples, we create a new column that approximately identifies some renowned locations and their surrounding areas:
- Los Angeles (Lat: 34.052235, Lon: -118.243683)
- San Jose (Lat: 37.335480, Lon: -121.893028)
- San Francisco (Lat: 37.773972, Lon: -122.431297)
- Sacramento (Lat: 38.575764, Lon: -121.478851)
- San Diego (Lat: 32.715736, -117.161087)
- Fresno (Lat: 36.746841, Lon: -119.772591)
- Stockton (Lat: 37.961632, Lon: -121.275604)
df <- df %>%
mutate(area = case_when(
between(longitude, -118.54, -117.94)
& between(latitude, 33.75, 34.35) ~ 'Los Angeles',
between(longitude, -122.19, -121.59)
& between(latitude, 37.03, 37.63) ~ 'San Jose',
between(longitude, -122.73, -122.13)
& between(latitude, 37.47, 38.07) ~ 'San Francisco',
between(longitude, -121.77, -121.17)
& between(latitude, 38.27, 38.87) ~ 'Sacramento',
between(longitude, -117.46, -116.86)
& between(latitude, 32.41, 33.01) ~ 'San Diego',
between(longitude, -120.07, -119.47)
& between(latitude, 36.44, 37.04) ~ 'Fresno',
between(longitude, -121.57, -120.97)
& between(latitude, 37.66, 38.26) ~ 'Stockton',
TRUE ~ 'Other areas'
)
) %>%
mutate_if(is.character,as.factor)
We observe the outcome:
df %>%
filter(area != "Other areas") %>%
ggplot(aes( x = longitude, y = latitude)) +
geom_point()

ggforce
provides different functions to highlight sets of data. One may draw an outline around data groups with different shapes:
- circles:
geom_mark_circle()
- ellipses:
geom_mark_ellipse()
- rectangles:
geom_mark_rect()
- hulls (convex closures):
geom_mark_hull()
We can try drawing a rectangle around the geographical areas:
df %>%
filter(area != "Other areas") %>%
ggplot(aes(x = longitude, y = latitude, color = area)) +
geom_mark_rect(aes(fill = area), concavity=10) +
geom_point()

We can try to draw more complex polygons with geom_mark_hull()
. It is possible to adjust the concavity of the resulting hulls through the concavity
parameter:
df %>%
filter(area != "Other areas") %>%
ggplot(aes(x = longitude, y = latitude, color = area)) +
geom_mark_hull(aes(fill = area), concavity=10) +
geom_point()

It is possible to add labels to the groups on the plot:
df %>%
filter(area != "Other areas") %>%
ggplot(aes(x = longitude, y = latitude, color = area)) +
geom_mark_hull(aes(fill = area, label = area), concavity = 10) +
geom_point()

We can also combine the plots with the geographical information from ggmap
as follows:
df %>%
filter(area != "Other areas") %>%
qmplot(x = longitude, y = latitude, data = .) +
geom_mark_hull(aes(fill = area, label = area), concavity = 10) +
geom_point()

Additional information and further examples can be found here¹⁵.
3. Conclusions
R is a powerful tool for data analysis and visualization.
In this post, our goal was not to share a complete data analysis work or a solution to the Machine Learning problem. Rather, we wanted to explore some packages that simplify the task of creating beautiful visualizations.
The referenced documentation provides further examples and information.
For more insights on the Grammar of Graphics and the concepts behind ggplot2
, we recommend Hadley Wickham’s "ggplot2 – Elegant Graphics for Data Analysis"¹.
4. References
[1] Hadley Wickham, "ggplot2 – Elegant Graphics for Data Analysis", Springer, 2009 (public link).
[2] Leland Wilkinson, "The Grammar of Graphics", Springer, 2005.
[3] https://www.kaggle.com/datasets/camnugent/california-housing-prices
[4] https://creativecommons.org/publicdomain/zero/1.0/
[5] https://cran.r-project.org/package=ggmap
[6] https://www.rdocumentation.org/packages/ggmap/versions/3.0.0/topics/qmplot
[7] David Kahle and Hadley Wickham , "ggmap: Spatial Visualization with ggplot2", The R Journal Vol. 5/1, June 2013, link.
[8] https://cran.r-project.org/package=ggpubr
[9] https://www.rdocumentation.org/packages/ggpubr/versions/0.4.0/topics/ggviolin
[10] https://www.rdocumentation.org/packages/ggpubr/versions/0.4.0/topics/stat_compare_means
[11] https://rpkgs.datanovia.com/ggpubr/
[12] https://cran.r-project.org/package=patchwork
[13] https://patchwork.data-imaginist.com/articles/guides/assembly.html