The world’s leading publication for data science, AI, and ML professionals.

The 7 R Packages You Should Be Using for Data Visualisation

With Example Code and Free Datasets .

Photo by Ussama Azam on Unsplash
Photo by Ussama Azam on Unsplash

Introduction

"It’s not what you do, it’s how you do it." ― Cheri Huber, Suffering Is Optional: Three Keys to Freedom and Joy

Having Data but not knowing how to visualise it is what I call self-inflicted suffering. So, like Cheri said, there are 3 keys to freedom and joy – my take on it – Data + R + these 7 R packages.

The Data

You can follow my code by using the datasets I chose or you can use your own. Here are some great resources on finding free and open-source datasets:

  1. Where to Find Free Datasets & How to Know if They’re Good Quality by Matt David (2021).
  2. R’s built in data sets. R allows you to use these datasets for commerical use under the GNU General Public License.
  3. These Are The Best Free Open Data Sources Anyone Can Use by Hiren Patel (2019).

The 7 (Nonfatal) R Packages

1. GGPLOT2

Let’s start with the original library that makes R the best language for visualisation – ggplot2. I am making use of a dataset from Kaggle, which you can find here (Sustainable Development Solutions Network, 2019, License CC0: Public Domain). The data looks at the state of global happiness:

Source by Author
Source by Author

Code

### GGPLOT2 EXAMPLE CODE
### LOAD KAGGLE DATA (Sustainable Development Solutions Network, 2019, License CC0: Public Domain)
#Load packages
library(openxlsx)
library(ggplot2)
library(RColorBrewer)
#Read in the first worksheet from the Excel workbook HappinessAlcoholConsumption.xlsx
library(dplyr)
data("happy", package = "ggplot2")
happy <- read.xlsx("2019_happiness.xlsx", sheet = '2019_happiness')
#Top 30 countries
happy %>%
  group_by(Country) %>% 
  summarise(HappinessScore_Avg = mean(Score)) %>%
  top_n(30) %>%
#The average happiness score for each country in the Top 30
ggplot(aes(x=reorder(factor(Country), HappinessScore_Avg), y=HappinessScore_Avg, fill = HappinessScore_Avg)) + 
  geom_bar(stat = "identity") + 
  xlab("Country") + 
  ylab("Average Happiness Score") +
# Make sure axes start at 0
scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
# Choose a theme, axis title sizes and labelling
theme(legend.position ='none',axis.title.y = element_text(size=8), axis.text.y = element_text(size = 8), axis.title.x = element_text(size=5),axis.text.x = element_text(size = 5), plot.title = element_text(size=10, hjust = 0.5)) +
  scale_fill_gradientn(name = '',colours = rev(brewer.pal(5,'Spectral'))) +
  geom_text(aes(label = HappinessScore_Avg), hjust = -0.3, size = 2) + ggtitle("Top 30 Countries by Happiness Score") +
  coord_flip()

An extension library to ggplot2 is called ggforce, which was ** developed by Thomas Pedersen. It has the added functionality of highlighting different groups and their specific features in your data, helping you to tell your story more effectively. You can see some ggforc**e examples here.

2. ColourPicker

Don’t waste time googling the hex codes for the colours you want to use— Rather make use of ColourPicker! This is particularly useful if you are required to stick to a customized colour palette. See the demo below:

Source by Author
Source by Author

Here is the code I used in the demo:

##Colour Picker using R's built-in dataset called mtcars(mtcars source:Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411)
#Install necessary libraries
library(ggplot2)
library(ggcorrplot)
install.packages("colourpicker")
library(colourpicker)
cols <- colourPicker()  #the screen will pop up after this line 
cols  # return the names of the colours selected
# Create a correlation matrix using R's built in dataset - mtcars
data(mtcars)
corr <- round(cor(mtcars), 1)
# Plot mtcars
ggcorrplot(corr, hc.order = TRUE, 
           type = "lower", 
           lab = TRUE, 
           lab_size = 2, 
           method="circle", 
           colors = c("red2" ,"white", "dodgerblue2"),
           outline.color = "#0D0C0C",
           title="Correlogram of car specs")

If you would like to read more about creating custom palettes, see an article I wrote here.

3. Esquisse

Have the data but not sure how to visualise it? Or, not sure how to write the code in R? I present to you – Esquisse. The esquisse package is an open-source software created by Zeno Rocha in 2018. Here is the link to the GitHub repository.

For this demo, I am going to search for an open-source dataset by using Google dataset search – essentially a Google for datasets that are freely available:

Source by Author
Source by Author

It directed me to Kaggle, where I will be using data related to Marvel and DC Comic data (Govindaraghavan, S., , 2020, License CC0: Public Domain).

#Import the necessary packages and libraries
install.packages("esquisse")
library(esquisse)
#Run the following:
esquisse::esquisser()

The following screen will appear:

Source by Author
Source by Author

Here you can import your data by uploading, pasting, connecting to a Google sheet or connecting to one of R’s built-in datasets. After I uploaded the marvel data, I see a preview pane before importing it:

Source by Author
Source by Author

Here is a short demo of how esquisse works:

Source by Author
Source by Author

4. PlotlyR

PlotlyR is a free and open-source graphing library. You can view the GitHub repository here. The example I am going to use here is visualising the online customer journey as they move along your marketing funnel.

# Install plotly from Github to get funnel plots
# Here I make up my own data but the functionality is the purpose of this viz
devtools::install_github("ropensci/plotly")
library(plotly)
fig <- plot_ly() 
fig <- fig %>%
  add_trace(
    type = "funnel",
    y = c("Clicked on Google Ad", "Explored Website", "Downloaded Brochure", "Added to basket", "Made Payment"),
    x = c(100, 80, 10, 9, 7)) 
fig <- fig %>%
  layout(yaxis = list(categoryarray = c("Clicked on Google Ad", "Explored Website", "Downloaded Brochure", "Added to basket", "Made Payment")))
fig
Source by Author
Source by Author

Immediately one sees that the issue lies on your website, since this is where the most customer churn is happening i.e., you are losing most of your customers before they even download the brochure. However, your brochure appears to be very informative, since you capture almost all the customers who did download it.

5. Quantmod

For anyone that likes analysing shares as a hobby or a job, Quantmod is for you. It is everything to do with financial modelling.

# Install quantmod
install.packages('quantmod')
library(quantmod)
#Select the company/share you want to visualise - I chose Google's shares provided by Yahoo Finance
getSymbols("GOOG",src="yahoo") 
#Plot 1: Cumulative Interest of Google Stocks Over Time
plot(Cl(GOOG), col = 'black') 
Source by Author
Source by Author
#Plot 2: Candle Chart
candleChart(GOOG,multi.col=TRUE,theme='black', subset = 'last 3 months')
Source by Author
Source by Author

6. RGL

RGL is if you need to plot anything in 3D.

#Install RGL
install.packages("rgl")
library("rgl")
#The next line of code produces a demo of what RGL can visualise. Follow the prompts in your console, but after pressing enter, it will produce a number of 3D visualisations
demo(rgl)

Here are some examples the demo produced:

Source: Murdoch, D & Adler, D (2021)
Source: Murdoch, D & Adler, D (2021)
Source: Murdoch, D & Adler, D (2021)
Source: Murdoch, D & Adler, D (2021)
Source: Murdoch, D & Adler, D (2021)
Source: Murdoch, D & Adler, D (2021)

7. Patchwork

Lastly, what about bringing all your visualisations together? Patchwork was created by Thomas Lin Pedersen in 2017, to make this an easy exercise. You can download it from CRAN and here is the GitHub repository.

#Install packages
devtools::install_github("thomasp85/patchwork")
library(ggplot2)
library(patchwork)
# All you need to do is assign you visualisation to a variable:
#Viz 1 - named p1
p1 <-happy %>%
  group_by(Country) %>% 
  summarise(HappinessScore_Avg = mean(Score)) %>%
  top_n(30) %>%  
  ggplot(aes(x=reorder(factor(Country), HappinessScore_Avg), y=HappinessScore_Avg, fill = HappinessScore_Avg)) + 
  geom_bar(stat = "identity") + 
  xlab("Country") + 
  ylab("Average Happiness Score") + 
 scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
  theme(legend.position ='none',axis.title.y = element_text(size=8), axis.text.y = element_text(size = 8), axis.title.x = element_text(size=5),axis.text.x = element_text(size = 5), plot.title = element_text(size=10, hjust = 0.5)) +
  scale_fill_gradientn(name = '',colours = rev(brewer.pal(5,'Spectral'))) +
  geom_text(aes(label = HappinessScore_Avg), hjust = -0.3, size = 2) +
  ggtitle("Top 30 Countries by Happiness Score") +
  coord_flip()
#Viz 2- named p2
p2 <- ggcorrplot(corr, hc.order = TRUE, 
           type = "lower", 
           lab = TRUE, 
           lab_size = 2, 
           method="circle", 
           colors = c("red2" ,"white", "dodgerblue2"),
           outline.color = "#0D0C0C",
           title="Correlogram of car specs")
#Return the saved visualisations as below:
p1 +  p2

The output will look as follows:

Source by Author
Source by Author

Furthermore, here is a guide on how to patch your graphs together using various styles and annotations to neaten it up.

Conclusion

Photo by Anoop Surendran on Unsplash
Photo by Anoop Surendran on Unsplash

You are now fully equipped to explore interesting datasets. Each of these R packages have plenty of examples in their own documentation and repositories. I encourage you to pick a dataset that interests you and dive into these libraries. If you enjoyed this article, you may also want to read – This Is How You Should Be Visualizing Your Data: 10 Examples to Guide Your Analysis.


Please consider using my referral link if you would like to become a Medium member and supporter. Your membership fee directly supports all writers you read. You’ll also get full access to every story on Medium.


Related Articles