The world’s leading publication for data science, AI, and ML professionals.

7-day Challenge - Mastering Ggplot2: Day 1 -  Bar Chart

A guide to getting to know ggplot2 visualization in R from zero.

Introduction

I am originally a Python user, and I have used Python for all of my analytical tasks. Thus, I did not think it was necessary to know R simultaneously. However, as I am studying for my master’s in Statistics, I’ve been increasingly interested in the R programming language, especially its stunning graphics.

I found R visualization incredibly user-friendly, especially with statistical visualization tasks. And ggplot2 is one of the most helpful packages to which I believe it is worth devoting my effort to learn deeper. I had an article about the overview of basic plots in ggplot2, Guide To Data Visualization With ggplot2, as you can find it here.

Since I want to learn more about ggplot2 for future research, I decided to take the challenge of mastering this visualization package within 7 days. And Medium will be where I keep track of my progress achievement. Hopefully, this will provide me with the motivation to accomplish my challenge while also sharing what I’ve learned with readers.

In my first article, I am going to show you what I have explored in making neat and nice bar charts.

Basic Bar Graph

The bar graph is frequently used to depict the distribution of a variable’s values or to compare data from different sub-groups within a data category. I believe that a bar chart is one of the most useful and powerful basic graphs that helps us gain significant insights in a variety of situations. Thus, anytime I learn a new visualization library, I usually begin with a bar chart.

But first, let me show you the dataset that I am going to work on through this article. You can easily get the dataset in R regclass package. I will select dataset EX2.TIPS, which is the record of tip amount in different parties for my practice.

Here is the summary of the dataset:

library(regclass)
library("skimr")
data(EX2.TIPS)
skim(EX2.TIPS)

Let’s start with a simple plot.

Target: I want to find how many females and males are in the dataset. In other words, I want to see the distribution of the variable "Gender". The plot can be done simply with the call of ggplotand geom_bar.

library(ggplot2)
library('dplyr')  
#Count number of people in each gender
count_gender <- 
  EX2.TIPS%>%
  group_by(Gender) %>%
  summarise(count= n())
#Plot x = gender, y= gender count
ggplot(count_gender, aes(x = Gender, y=count)) + 
geom_bar()

But wait, the graph seems a little simple without a header and color. Don’t worry. You can easily customize your bar chart with different attributes, such as the color, the width of the bars. Here are several contributes you can use to update your chart.

  • In order to change your bar chart’s color, identify your chosen color with fill attribute in geom_barTo change the color border of bar charts, use color attribute.
  • To change x,y axes’ names, use ggtitle, xlab, ylab
  • To customize x, y axes, and title font or color, make changes to theme
  • To get value number on each bar, add ageom_text
ggplot(count_gender, aes(x = Gender,y=count)) + 
#customize bars 
 geom_bar(color="black",
           fill = "pink",
           width= 0.5,
           stat='identity') +
#adding values numbers
  geom_text(aes(label = count), 
            vjust = -0.25) +
#customize x,y axes and title
  ggtitle("Distribution of Gender") +
  xlab("Gender") + 
  ylab("Total")) +
#change font
  theme(plot.title = element_text(color="black", size=14,          face="bold", hjust = 0.5 ),
       axis.title.x = element_text(color="black", size=11, face="bold"),
       axis.title.y = element_text(color="black", size=11, face="bold"))

In case you do not want a vertical bar chart, but a horizontal one. It’s very simple, just add a coord_flip() element. And here is what we get:

As you can see in the graph, obviously the number of men going to the party is nearly two times as females.

Stacked Bar Chart

Target: For each day of the week, I want to compare the total of men going to parties compared to women.

I can do this by using a stacked bar chart to observe the difference between these two genders.

A stacked bar chart is achieved by setting position in geom_bar to "stack". Besides, by getting fill equal "Gender", ggplot will generate a bar chart where each bar (representing each level of the "Weekday" variable) is contributed by all levels in "Gender" variable.

Here is how my simple stacked bar chart looks like:

ggplot(EX2.TIPS, aes(x = Weekday,fill=Gender)) + 
  geom_bar(position = "stack")

As we can see in the above graph, there are two levels of Gender and four levels of Weekday, and there is a value for the number of people in each gender on each day of the week.

However, in order to compare the ratio of males and females on different days of the week, I prefer to use a segmented bar plot. It **** is a type of stacked bar chart where the sum of each bar is 100%. By specifying the argument position = "fill" in geom_bar , we can easily have a segmented bar plot as below:

ggplot(EX2.TIPS, aes(x = Weekday,fill=Gender)) + 
  geom_bar(position = "fill")

Now, let’s try to improve this segmented bar chart by adding labels, titles, and customizing the x-axis, y-axis, etc. Besides some of the control elements mentioned above, we also have:

  • scale_y_continuous control position scale for continuous data (y)
  • scale_fll_brewer set color fills for bar chart
  • Installing packages ggthemes for different chart backgrounds.

In detail, you can see the code below:

#Calculating the percentage of both genders in each day of the week
pctdata <- EX2.TIPS %>%
  group_by(Weekday,Gender ) %>%
  summarize(count = n()) %>% 
  mutate(pct = count/sum(count),
         percent_scale = scales::percent(pct))
#Plotting 
ggplot(pctdata, aes(x = Weekday, y=pct, fill=Gender)) + 

 geom_bar(position = "fill", 
           stat = 'identity') +
#Adjusting y-axis tick mark   
 scale_y_continuous(breaks = seq(0, 1, .2), 
                     label = percent) + 

#Adding value label
 geom_text(aes(label = percent_scale), 
            size = 3, 
            position = position_stack(vjust = 0.5)) + 
#Adusting color fill  
 scale_fill_brewer(palette = "Set3")  + 

#Adjusting title, labels 
 ggtitle("Gender Distribution") +

 xlab("Days of Week") + ylab("Percentage") +
#Changing theme  
 theme_stata() +
 theme(
    plot.title = element_text(color="black", size=14, face="bold", hjust = 0.5 ),
    axis.title.x = element_text(color="black", size=11, face="bold"),
    axis.title.y = element_text(color="black", size=11, face="bold" ))

Side-by-side Bar Chart

Target: I want to track the number of both females and males going to party during different days of the week.

I can use a grouped bar chart to see the trend of both genders during the week.

So, how can I get a grouped bar chart? With ggplot, it is simple as you just have to change the position in geom_bar element to dodge. Meanwhile, every other thing is done similarly as we plotted the simple bar chart.

ggplot(count_day, aes(x = Weekday, y=count, fill=Gender)) + 

 geom_bar(position = "dodge", 
           stat = 'identity') +

 geom_text(aes(label = count),
            colour = "black", 
            size = 3,
            vjust = 1.5, 
            position = position_dodge(.9)) + 
 scale_fill_brewer(palette = "Set3")  + 
 ggtitle("Number of females/males by days of the week") +
 xlab("Days of Week") + ylab("Total number") + 

 theme_stata()+
 theme(
    plot.title = element_text(color="black", size=14, face="bold", hjust = 0.5 ),
    axis.title.x = element_text(color="black", size=11, face="bold"),
    axis.title.y = element_text(color="black", size=11, face="bold" )) 

Put Different Bar Charts In One Place

Sometimes, we want to put 2 graphs side by side for easier comparison. While searching for a solution that I can use flexibly, I came across this website. I have to say that it is very detailed, and the solution is easy to customize. I will apply their suggestion to my case here by using grid library.

For instance, I want to get my Gender distribution graph (denoted by p) and Number of females/males by days of the week (denoted by m) side by side. Here is how I do it:

library(grid)
# Creating a new page 
 grid.newpage()
# Create layout: nrow = 1, ncol = 2
 pushViewport(viewport(layout = grid.layout(nrow = 1, ncol = 2)))

# A helper function to define a region on the layout
 define_region <- function(row, col){
   viewport(layout.pos.row = row, layout.pos.col = col)
 } 
# Identify plot positions
 print(p, vp = define_region(row = 1, col = 1))   
 print(m, vp = define_region(row = 1, col = 2))

Conclusion

Yeah. That’s what I learned about Bar Graphs in ggplot2 for day 1 in my 7-day challenge to master ggplot2. If you have anything interesting to share with me about this cool library. Let me know. I will be back soon with day 2’s topic: Line graph.

In order to receive updates regarding my upcoming posts, kindly subscribe as a member using the provided Medium Link.

Reference

http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/81-ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page/

https://rkabacoff.github.io/datavis/Bivariate.html#Categorical-Categorical


Related Articles