Introduction
In my previous post, 7-day Challenge – Mastering Ggplot2: Day 1 – Bar Chart, I shared my experience to graph a decent bar chart with some core elements. I’ll continue to share my Line Chart learning experience with you in this article today.
What I want to deliver to you is not a so-comprehensive guideline to tell you everything you want to know about ggplot2. The scope will be too much to cover, and reading a book sounds like a better solution then. That is to say, what you would expect in this article is my cover of some fundamental elements in each type of chart, which I believe you can customize to apply in different situations flexibly.
So, in short, I would rather call my article a cheat sheet for ggplot2 visualization, and I hope my cheat sheet can be some of the help to you.
Dataset
The most common use of a line chart is to describe the trend of objects over time. Example: Revenue gained of 3 export companies from 2000 to 2021.
Therefore, I will use a time-series dataset from **** the "tsibble" package in R to visualize. My dataset’s name is "pedestrian" which provides information about the number of pedestrians counted every day by different sensors in city Melbourne from 2015 to 2016.
Below is a glimpse of my dataset:
Single Line Graph
Target: Total number of pedestrians in different months of the year.
I plot my basic line graph based on all the elements mentioned in my previous post about Bar Chart. They include the control of title, x-axis, y-axis, theme, color, and so on. To each attribute explained previously, I will not explain again but go directly to the graph code.
In the same way that geom_bar
is used to produce bars in a bar chart, geom_line
is used to create lines in a line chart.
Now, let’s look at fig 1 in my simple line chart development image below. By using geom_line,
theme,
xlab,
ylab,
and ggtitle
, fig 1 is created with no difficulties. My code for figure 1 can be seen as follows.
#Calculating number of pedestrians per month
df1 <- pedestrian
count_all <- df1 %>%
group_by(month_year) %>%
summarise(total = sum(Count))
#Fig 1- simple line chart
f1 <- ggplot(count_all, aes(x=month_year, y=total)) +
##Create a line by calling geom_line
geom_line(group=1) +
##Modify titles
xlab("Year-Month ") +
ylab("Total pedestrians (Thousands)") +
ggtitle("Total Pedestrians by Months (Fig 1)") +
#Change theme
theme_economist() + scale_fill_economist() +
theme(
plot.title = element_text(color="black", size=14, face="bold", hjust = 0.5 ),
axis.title.x = element_text(color="black", size=11, face="bold"),
axis.title.y = element_text(color="black", size=11, face="bold" ))
There is a small notice with geom_line
. When you plot a single line chart for a group only, remember to state group=1
in geom_line
or else your plot will have an error. Simply understand, all points must be linked, hence group=1.
However, my fig 1 graph is quite messy, as you can see. Because the tick labels on my y-axis are too large, it’s difficult to follow the values. Worse, since my x-axis tick labels are written on each other, they can’t be seen. Therefore, to fix these, I made a small change to my theme
attribute by:
- Customizing
axis.text.x
to get my x-axis text label to rotate 45 degrees. Adjustingangle=45
to get the result. - Re-calculating my values on the y-axis with
scale_y_continuous
to make my numbers organized.
Here is how I did to transform it into fig 2.
#Adding figure 1 to figure 2
f2 <- f1 +
ylab("Total pedestrians (Thousands)") +
##Modifying labels on axes
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5, size = 9))+
scale_y_continuous(labels = function(x) format(x/1000))
For fig 3, I tried to improve my fig 2 by adding color
to the graph and adjusting the size
of my graph line. Besides, I also added points to the line graph with geom_point
elements.
f3 <- f2 +
geom_line(group=1, size = 1.5, color = "darkblue") +
geom_point(size = 3.5, color = "darkblue", shape =17)
There are many shapes of geom_point,
and you can read more about them here to select your best choice. Similarly, you can choose different types of lines by specifying linetype
as one of these: "twodash", "blank", "dashed", "solid", "longdash", "dotted", "dotdash"
In fig 4, I realize I do not necessarily need geom_point
any more, and I want to label my line graph with actual values. How can I do it? It’s effortless; just need to add geom_label
to what you already had in the previous figure.
#Adding graph labels
f4 <- f3 + geom_label(aes(label = round(total/1000,0)), size =3, color = 'darkblue')
My final graph looks so much better than the first one, right?
Multiple Line Graph
Target: Total number of pedestrians in different months of the year counted by each sensor.
Simple multiple line graph
For a multiple-line graph, basically, everything is the same as a single-line graph. However, there is a slight difference in geom_line
attributes, which is we don’t have to state group=1
in the multiple-line graph. Instead, we have to specify group=Sensor
& color=Sensor
in aes
attribute so that R can understand and differentiate four groups of sensors.
Here is how I achieved a multiple-line graph in image 4 with mentioned attributes and modifications in the single-line graph.
#Calculating number of pedestrians by each sensor during the period.
count_sensor <- df1 %>%
group_by(month_year,Sensor) %>%
summarise(total = sum(Count))
#Plotting
##Specifying group and color group
f5 <- ggplot(count_sensor, aes( x=month_year, y=total, group=Sensor, color=Sensor)) +
geom_line(size = 1.5) +
##Modifying titles, labels and background
xlab("Year-Month") + ylab("Total pedestrians") +
ggtitle("Total Pedestrians Counted by Sensors (Fig 1)") +
scale_y_continuous(labels = function(x) format(x/1000)) +
theme_economist() + scale_fill_economist() +
theme(
plot.title = element_text(color="black", size=15, face="bold", hjust = 0.5 ),
axis.title.x = element_text(color="black", size=11, face="bold"),
axis.title.y = element_text(color="black", size=11, face="bold" ),
axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5, size = 9))
Modifications (Legend, Colors, etc)
Color
I am not happy with the colors of the four lines in image 4. I want to change it. Ok, if that’s what you think, you can always do it easily with scale_colour_manual.
#Manually set the colors
f5 +
scale_colour_manual(values=c("darkblue","darkgreen","red","darkgrey"))
Legend
You can also change your legend. Here are a few attributes we need to care about:
theme(legend.position = position):
Change your graph legend position to "left", "top", "right", "bottom", "none" or c(x, y)( The x and y values must range between 0 and 1. c(0,0) represents (left, bottom) and c(1,1) represents (right, top).theme(legend.background):
To change the background of the legend.element_text
intheme
attribute to change the font size of the legend.
As such, I applied these to customize my graph visual:
f6 <- f5 +
#Setting line colors
scale_colour_manual(values=c("darkblue","darkgreen","red","darkgrey"))+
#Adjusting legend theme
##Position
theme(legend.position="top") +
##Changing text contributes
theme(legend.title = element_text(colour="black", size=12,
face="bold.italic"),
legend.text = element_text(colour="black", size=12,
face="italic"))
Value Label Modification
In case, I want to illustrate the maximum and the minimum pedestrians’ number of "Bourke Street Mall (North)" only, how can I do it?
It’s very simple, just filter the label values you want to insert into the graph and put them under geom_label
element.
##Filter values
values <- count_sensor %>% filter(Sensor == 'Bourke Street Mall (North)', month_year %in% c("2016-12","2015-11"))
##Plot
f6 + geom_label(
aes(label = round(total/1000,0)), data=values, size =3, color = 'darkgreen')
You can apply my method of filtering value labels to similar cases:
- Adding labels to just 1 line
- Adding labels to a specific level of x value
Conclusion
Above are some tips as I learn to plot a line chart in ggplot2. I have to say, as I look more into this library, I find many more interesting, and I am so excited to explore further. The layer plot type makes visualization becomes easier and more convenient than ever.
Definitely keep up the learning.
Later, please join me in discussing Day 3 topic: Slope Graph.