
TUTORIAL – PREDICTION – R
In one of my last projects, I was asked to perform a simple linear regression to foresee possible price developments. To compare the actual price development, we used the consumer price index as a baseline. This article will show you how I tried to achieve this with a different data set – using ggplot2 for plotting and linear regression for prediction.
1. Setup
I will briefly explain my setup, including the data and R packages that I am using.
Packages
In general, I always use the Tidyverse package. It includes the packages like ggplot2 to create beautiful plots in a very intuitive way, dplyr that makes data manipulation so straightforward, and so much more.
Additionally, I make use of the ggthemes packages that allow for more out-of-the-box themes for your plots.
Oktoberfest Beer Price and Inflation Index Data
As the leading price index in focus, I use available data from the Oktoberfest, the world’s largest beer festival. The data contains not only beer price information but also numbers of visitors, chicken prices, and beer sold.

I will use available information about the consumer price index in Germany to compare the beer price development.
"The Consumer Price Index (CPI) is a way of measuring the overall price level of the consumer goods and services in the economy." – Source Wikipedia.
The data itself comes from the Database of the Federal Statistical Office of Germany. Please note that Verbraucherpreisindex (VPI) is German for Consumer Price Index (CPI).

2. My Workflow to Predict and Visualize Price Developments
Join Beer Price and VDI Data
Create Linear Regression Models


Join all Datasets

Create a Line Graph Plot including Predictions and Confidence Levels


Conclusion
This article showed you how I visualized price developments and how I incorporated price predictions using linear regression models. When I look at the result, I see two significant limitations.
Firstly, I am aware that linear regression is not the best predictor for prices in general, even though it might be justified in this case (i.e., predicting yearly Oktoberfest beer price changes). Even though I included the margin of error in the visualization, it does not account for uncertainty increases for each future year. And this is not reflected in this model. I think it is worthwhile to look into the Holt-Winters forecasting method and time series in general to encounter this. Holt-Winters also takes into account seasonality.
Secondly, creating a reproducible chart by including text annotations makes the code quite cluttered and hard to maintain (or even to write in the first place). Also, I am unsure whether the plot itself is unbalanced and cluttered with all the text annotations.
What do you think I could do to improve this solution?
Please feel free to contact me with any questions and comments. Thank you. Find more articles from me here: