Microsoft has recently been making a strong move to align Open Source technologies and insert Artificial Intelligence technologies into its products, and Power BI is included in this plan. Power BI is one of the main tools for building dashboards today and Microsoft is increasing its development power and flexibility every day.
To make the development of dashboards feasible, Power BI has several functionalities for data treatment, and one of the most important is the integration tool with R and more recently the integration with Python. The option of working with R and Python development languages opens up a huge range of possibilities within the BI tool, and one of these possibilities is to work with Machine Learning tools and build models directly in Power BI.
In this article I will be going step-by-step on how to train and make predictions with a machine learning model directly in PowerBI using the R language, from the following topics:
- Installing the dependencies
- Analyzing the data
- Hands-on – the code
- Results
- Conclusion
1. Installing the dependencies
The first step is to install R Studio on your machine, as the development will be in the R language. As much as PowerBI has native integration with the R language, it requires the user to install the R packages on the machine.
The download can be done through this link: https://www.rstudio.com/products/rstudio/download/
Right after installation, you must open R studio and install the dependency libraries.
- caret
- datasets
- monomvn
To install the packages in R, the website R-bloggers has a great tutorial that teaches you how to install and load packages in R Studio. Link: https://www.r-bloggers.com/2013/01/how-to-install-packages-on-r-screenshots/
Analyzing the data
The dataset was obtained from the kaggle.com website and consists of beer consumption data at a university in São Paulo, together with the minimum, maximum and average temperatures for each day and the volumetric rainfall.
To add another important feature I created a column called "Weekend" that indicates when the day is Saturday or Sunday, to take a little bit of the seasonality of consumption since the weekend presents a higher consumption, I could have considered Friday too, but for this first moment, I decided to be more conservative.

3 – Hands-on – The code
For the tests, I put together a Bayesian linear regression model using the monomvn package (Bayesian Ridge Regression) to predict the beer consumption data in liters per day, along with validation through Cross Validation with 10 folds.
I will not go into much of the model and its results in this article, as the goal is to focus more on integration with Power Bi than on modeling.
The first part of the code imports the libraries
library(caret)
library(datasets)
library(monomvn)
Right after we import the Power BI data as a dataset, inside the R
mydata <- dataset
mydata <- data.frame(mydata)
With that we can create the model and perform the prediction. In this case I did not define a test dataset, I just used the training dataset with CV10 validation to briefly analyze the training metrics.
fitControl <- trainControl(
method = "repeatedcv",
number = 10,
repeats = 10)
lmFit <- train(mydata[1:365, 2:6],mydata[1:365,7], method ='bridge',trControl = fitControl)
predictedValues <- predict(lmFit,newdata = mydata[, 2:6])
Finally, we created a new column in the PowerBI dataset with the values generated by the model’s prediction.
mydata$PredictedValues <- predictedValues
Full Code
library(caret)
library(datasets)
library(monomvn)
mydata <- dataset
mydata <- data.frame(mydata)
fitControl <- trainControl(
method = "repeatedcv",
number = 10,
repeats = 10)
lmFit <- train(mydata[1:365, 2:6],mydata[1:365,7], method ='bridge',trControl = fitControl)
predictedValues <- predict(lmFit,newdata = mydata[, 2:6])
mydata$PredictedValues <- predictedValues
4 – Results
Below is the complete dataset with the real value, the prediction and the error (%). The average error in training was 7.82% with a maximum of 20.23% and a minimum of 0.03%

Below is a graph with the real data in black and the prediction data in red, in addition to the error in blue bars for the entire test period.

4.1 – Correlation with temperature
When plotting the beer consumption (in black/red) with the temperature (in blue) we see that the consumption follows well the temperature variations between the months, both the "micro" (daily) variations and the "macro" (trends) of temperature variation. We see that the increase in temperature results in greater consumption, for example at the end and beginning of the year, which is when we have summer and the lower temperature in winter results in lower consumption of beer in SP.

4.2. Correlation between real data and prediction
Applying a correlation between the real values and the values predicted by the model, we would have as ideal the data concentrated in the black dashed line (graph below), in a scenario where the predicted data would be equal to the real data. By making this correlation graph we can see how dispersed the model’s prediction is and whether the concentration of the predictions is under or overestimated.
When analyzing the correlation graph we see that the dispersion is not as high for an initial model, with an average variation of 7.8% as seen previously. When analyzing the concentration of the data, we see that the model varies between predictions greater or less than the real, but in most cases the model slightly overestimates the consumption data, predicting a higher consumption than the real.

4.3 Tests – 2018 data
After training the model with the 2015 data, I tried to obtain data for an inference and obtained a 2018 dataset with temperature and rainfall data for the city of São Paulo.
It is noted below that the values presented by the inference in the 2018 data show the same pattern as the real data, reducing consumption in the middle of the year with a reduction in temperature and showing peaks on weekends.

4.3.1 Correlation with temperature
Then the values plotted together with the temperature, in blue, prove a correlation between consumption and temperature and demonstrate its dynamics throughout the year also in the inference data.

4.3.2 Seasonality of the weekend
One way to explain the seasonal cycles is the increase in beer consumption on weekends, we can see below the graph of consumption by the average temperature, where the black bars represent the days of the weekend, Saturday and Sunday.

This seasonality also happens in the real data, when we plot the real consumption of 2015, the high consumption pattern is repeated on weekends, which shows that the model, even though simple, was well adapted to the data dynamics.

5. Conclusion
Power BI as a graphical tool provides great versatility and velocity to development an analytical visualization from Machine Learning model outputs, besides being able to present at the same time an exploratory view of the database itself. Incorporating the power of Machine Learning models into a BI tool is undoubtedly a major advance for developers working in analytical sectors and PowerBI brings this functionality in a simple and functional way.
Finally, any questions or suggestions about the article or the topics of Machine Learning, PowerBI, R, Python, among others feel free to contact me on LinkedIn: https://www.linkedin.com/in/octavio-b-santiago/