The world’s leading publication for data science, AI, and ML professionals.

7-day Challenge - Mastering Ggplot2: Day 2-Line Graph

A guide to getting to know ggplot2 visualization in R from zero - Line Graph

Introduction

In my previous post, 7-day Challenge – Mastering Ggplot2: Day 1 – Bar Chart, I shared my experience to graph a decent bar chart with some core elements. I’ll continue to share my Line Chart learning experience with you in this article today.

What I want to deliver to you is not a so-comprehensive guideline to tell you everything you want to know about ggplot2. The scope will be too much to cover, and reading a book sounds like a better solution then. That is to say, what you would expect in this article is my cover of some fundamental elements in each type of chart, which I believe you can customize to apply in different situations flexibly.

So, in short, I would rather call my article a cheat sheet for ggplot2 visualization, and I hope my cheat sheet can be some of the help to you.

Dataset

The most common use of a line chart is to describe the trend of objects over time. Example: Revenue gained of 3 export companies from 2000 to 2021.

Therefore, I will use a time-series dataset from **** the "tsibble" package in R to visualize. My dataset’s name is "pedestrian" which provides information about the number of pedestrians counted every day by different sensors in city Melbourne from 2015 to 2016.

Below is a glimpse of my dataset:

Single Line Graph

Target: Total number of pedestrians in different months of the year.

I plot my basic line graph based on all the elements mentioned in my previous post about Bar Chart. They include the control of title, x-axis, y-axis, theme, color, and so on. To each attribute explained previously, I will not explain again but go directly to the graph code.

In the same way that geom_baris used to produce bars in a bar chart, geom_line is used to create lines in a line chart.

Now, let’s look at fig 1 in my simple line chart development image below. By using geom_line, theme, xlab, ylab, and ggtitle , fig 1 is created with no difficulties. My code for figure 1 can be seen as follows.

#Calculating number of pedestrians per month
df1 <- pedestrian
count_all <- df1 %>%  
             group_by(month_year) %>% 
             summarise(total = sum(Count))
#Fig 1- simple line chart
f1 <- ggplot(count_all, aes(x=month_year, y=total)) +

  ##Create a line by calling geom_line
  geom_line(group=1) + 

  ##Modify titles
  xlab("Year-Month ") +
  ylab("Total pedestrians (Thousands)") +
  ggtitle("Total Pedestrians by Months (Fig 1)") +

  #Change theme 
  theme_economist() + scale_fill_economist() + 
  theme(
  plot.title = element_text(color="black", size=14, face="bold",   hjust = 0.5 ),
  axis.title.x = element_text(color="black", size=11, face="bold"),
  axis.title.y = element_text(color="black", size=11, face="bold" ))

There is a small notice with geom_line . When you plot a single line chart for a group only, remember to state group=1 in geom_line or else your plot will have an error. Simply understand, all points must be linked, hence group=1.

However, my fig 1 graph is quite messy, as you can see. Because the tick labels on my y-axis are too large, it’s difficult to follow the values. Worse, since my x-axis tick labels are written on each other, they can’t be seen. Therefore, to fix these, I made a small change to my theme attribute by:

  • Customizing axis.text.xto get my x-axis text label to rotate 45 degrees. Adjusting angle=45 to get the result.
  • Re-calculating my values on the y-axis with scale_y_continuous to make my numbers organized.

Here is how I did to transform it into fig 2.

#Adding figure 1 to figure 2 
f2 <- f1 + 
      ylab("Total pedestrians (Thousands)") +

##Modifying labels on axes
      theme(axis.text.x = element_text(angle = 45, vjust = 0.5,     hjust=0.5, size = 9))+
      scale_y_continuous(labels =  function(x) format(x/1000))

For fig 3, I tried to improve my fig 2 by adding color to the graph and adjusting the sizeof my graph line. Besides, I also added points to the line graph with geom_point elements.

f3 <- f2 + 
   geom_line(group=1, size = 1.5, color = "darkblue") +
   geom_point(size = 3.5, color = "darkblue", shape =17)

There are many shapes of geom_point,and you can read more about them here to select your best choice. Similarly, you can choose different types of lines by specifying linetype as one of these: "twodash", "blank", "dashed", "solid", "longdash", "dotted", "dotdash"

In fig 4, I realize I do not necessarily need geom_point any more, and I want to label my line graph with actual values. How can I do it? It’s effortless; just need to add geom_label to what you already had in the previous figure.

#Adding graph labels
f4 <- f3 + geom_label(aes(label = round(total/1000,0)), size =3, color = 'darkblue')

My final graph looks so much better than the first one, right?

Multiple Line Graph

Target: Total number of pedestrians in different months of the year counted by each sensor.

Simple multiple line graph

For a multiple-line graph, basically, everything is the same as a single-line graph. However, there is a slight difference in geom_line attributes, which is we don’t have to state group=1 in the multiple-line graph. Instead, we have to specify group=Sensor & color=Sensorin aes attribute so that R can understand and differentiate four groups of sensors.

Here is how I achieved a multiple-line graph in image 4 with mentioned attributes and modifications in the single-line graph.

#Calculating number of pedestrians by each sensor during the period. 
count_sensor <- df1 %>%
   group_by(month_year,Sensor) %>%
   summarise(total = sum(Count))
#Plotting
##Specifying group and color group 
f5 <- ggplot(count_sensor, aes( x=month_year, y=total, group=Sensor,   color=Sensor)) +
   geom_line(size = 1.5) + 
##Modifying titles, labels and background
   xlab("Year-Month") + ylab("Total pedestrians") +
   ggtitle("Total Pedestrians Counted by Sensors (Fig 1)") +

   scale_y_continuous(labels =  function(x) format(x/1000)) +

   theme_economist() + scale_fill_economist() + 

   theme(
     plot.title = element_text(color="black", size=15, face="bold", hjust = 0.5 ),
     axis.title.x = element_text(color="black", size=11, face="bold"),
     axis.title.y = element_text(color="black", size=11, face="bold" ),
     axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5, size = 9))

Modifications (Legend, Colors, etc)

Color

I am not happy with the colors of the four lines in image 4. I want to change it. Ok, if that’s what you think, you can always do it easily with scale_colour_manual.

#Manually set the colors
f5 +
scale_colour_manual(values=c("darkblue","darkgreen","red","darkgrey"))

Legend

You can also change your legend. Here are a few attributes we need to care about:

  • theme(legend.position = position):Change your graph legend position to "left", "top", "right", "bottom", "none" or c(x, y)( The x and y values must range between 0 and 1. c(0,0) represents (left, bottom) and c(1,1) represents (right, top).
  • theme(legend.background): To change the background of the legend.
  • element_text in theme attribute to change the font size of the legend.

As such, I applied these to customize my graph visual:

f6 <- f5 + 
#Setting line colors
scale_colour_manual(values=c("darkblue","darkgreen","red","darkgrey"))+
#Adjusting legend theme   
##Position
theme(legend.position="top") + 
##Changing text contributes
theme(legend.title = element_text(colour="black", size=12, 
face="bold.italic"),
legend.text = element_text(colour="black", size=12, 
face="italic"))

Value Label Modification

In case, I want to illustrate the maximum and the minimum pedestrians’ number of "Bourke Street Mall (North)" only, how can I do it?

It’s very simple, just filter the label values you want to insert into the graph and put them under geom_label element.

##Filter values 
values <- count_sensor %>% filter(Sensor == 'Bourke Street Mall (North)', month_year %in% c("2016-12","2015-11"))
##Plot
f6 + geom_label(
     aes(label = round(total/1000,0)), data=values, size =3, color = 'darkgreen')

You can apply my method of filtering value labels to similar cases:

  • Adding labels to just 1 line
  • Adding labels to a specific level of x value

Conclusion

Above are some tips as I learn to plot a line chart in ggplot2. I have to say, as I look more into this library, I find many more interesting, and I am so excited to explore further. The layer plot type makes visualization becomes easier and more convenient than ever.

Definitely keep up the learning.

Later, please join me in discussing Day 3 topic: Slope Graph.


Related Articles