NeuralProphet: Forecasting Energy Demand

The gap between classical forecasting techniques and deep learning models

Brendan Artley

Published in

Towards Data Science

9 min readMay 8, 2022

Introduction

In this article, use NeuralProphet (by Meta AI) to forecast energy demand. Forecasting energy demand is extremely important as the demand for electricity increases. Knowing how much electricity is needed ahead of time has a significant impact on carbon emissions, energy cost, and policy decisions¹.

On November 30, 2021 Meta AI (formerly Facebook) released NeuralProphet. NeuralProphet was built to bridge the gap between classical forecasting techniques and deep learning models. In this article, I will showcase the NeuralProphet framework and evaluate its performance against other forecasting techniques.

Loading Data

The dataset we are going to use contains electricity demand data from Victoria, Australia. Victoria is home to 6.7 million people. The dataset has daily data points from January 1st, 2015 to October 6th, 2020. This gives us enough samples to pick up on any seasonality and trends in the dataset. This dataset can be found here.

In the following cell, we load this dataset into pandas for preprocessing. Similar to Prophet, NeuralProphet requires a ‘ds’ column for the date/time stamp, and a ‘y’ column for the data value.

We can then use Matplotlib to visualize the data. In the first plot, we can see all of the data points. It appears that there is some yearly seasonality in the data. The energy demand generally increases every year until June, when it then decreases for the rest of the year.

The second plot simply shows the first 100 days of data. We can see that energy demand is not consistent day to day. If there is any weekly seasonality in the data, it is difficult to identify from this plot.

In the next cell, we are simply creating a validation and testing set for the model. We could use NeuralProphet’s built-in .split_df() function, but I found that this duplicated rows in the validation and test set. For this reason, we will use simple indexing.

Note that the validation and testing set should always contain the most recent data points. This ensures that we do not train on data from the future and make predictions on the past.

Defining a Model

NeuralProphet uses a very similar API to Prophet. If you have used Prophet before, then using NeuralProphet will be very intuitive.

In the following cell, we are simply defining a model and fitting this model to the data. We use ‘D’ to set the frequency of predictions as daily, and we use plot-all to visualize model performance live during training. The only other alteration we make is to specify Australian holidays.

We can see from the graph above that the model is being overfit to the data. The model is fitting as low as it can on the training data, but we want the model to fit well on unseen data (ie. validation set).

Looking at the metric plots above, we can see that the optimal parameters are reached around 25–30 epochs and then the model starts to overfit. We can combat this by specifying a number of epochs. A complete list of tuneable model parameters can be found here.

By specifying the number of epochs, we significantly reduce the validation RMSE. Even changing one parameter can improve our model significantly (as shown above). This suggests that using parameter tuning and translating domain knowledge to the model can improve its performance.

Model Evaluation

Before we try and squeeze every ounce of performance out of our model, lets see how we can evaluate our model.

In the next cell, we are simply making a forecast that is the same length as the validation set. We can then visualize this using the .plot() function. This gives a decent visualization of the forecast, but does not provide a performance metric, nor can we see the predictions very clearly.

To address the limitations of the built-in plot, I put together a customized plot using Matplotlib. The following cell plots the predictions with the true labels and shows the model metrics in the plot title.

Next, we can look at the model parameters. This can give us a sense of seasonality patterns and the trend of the data.

In the first and second plots, we can see that there was a spike in energy demand in 2018. Then, the demand dips and steadily increases throughout 2019 and 2020. This gives us a sense of how energy demand changes over time.

In the third plot, we are looking at the yearly seasonality. We can see that energy demand is at its lowest in April and October, and energy demand is at its highest in July. This makes sense, as July is the coldest month of the year in Australia. Interestingly, the warmest month is February, when we see a small spike in energy demand. This could indicate that people use electricity for Air Conditioning during the hottest month.

The fourth plot shows the weekly seasonality. This indicates that the energy consumption is at its lowest on Saturday and Sunday.

Finally, we have the plot of the additive events. This plot shows the effect of the Australian holidays that we added. We can see that on Holidays, the energy demand is typically lower than usual.

Adding AR-Net (AutoRegression)

One of the new additions in Prophet is AR-Net (Auto-Regressive Neural Network). This allows NeuralProphet to use observations from previous time steps when making a prediction. In our case, this means that the model can use the previous day’s energy demands to make its predictions.

AR-Net can be enabled by setting an appropriate value to the n_lags parameter when creating the NeuralProphet Model. We are also increasing the checkpoints_range as we are making short-term predictions on the data.

We can see from the metrics above that the validation RMSE decreased again. This is another significant gain in model performance we got by simply tuning two parameters.

If we use the same code that we did previously, only one prediction is made. It is unclear from the docs how to make “running” predictions when AR-Net is enabled, and therefore we can use the following code to make this possible. If anyone knows a built-in way to do this please let me know!

We can then use the following code block to plot our predictions. We can see from the plot that the model is starting to pick up on outlying points.

If we then plot the model components, we can see that there is an additional plot shown. This plot shows how much each lagged term affects the prediction. In our case, we can see that the most recent days are the most important to the model. In most time series problems, this is often the case.

Hyperparameter tuning

Up to this point, we have been able to improve our validation RMSE manually. This is pretty good, but we only tuned a couple of parameters. What about other parameters? Consider the following list of tuneable parameters and their default values.

NeuralProphet(growth=’linear’, changepoints=None, n_changepoints=10, changepoints_range=0.9, trend_reg=0, trend_reg_threshold=False, yearly_seasonality=’auto’, weekly_seasonality=’auto’, daily_seasonality=’auto’, seasonality_mode=’additive’, seasonality_reg=0, n_forecasts=1, n_lags=0, num_hidden_layers=0, d_hidden=None, ar_reg=None, learning_rate=None, epochs=None, batch_size=None, loss_func=’Huber’, optimizer=’AdamW’, newer_samples_weight=2, newer_samples_start=0.0, impute_missing=True, collect_metrics=True, normalize=’auto’, global_normalization=False, global_time_normalization=True, unknown_data_normalization=False)

It would take a lot of time and effort to manually enter all the possible combinations of these parameters. We can combat this by implementing hyper-parameter tuning. In this implementation, we are simply testing all possible combinations of parameters in the parameter grid. This means that the number of possible combinations grows exponentially as more parameters are added².

This could potentially be improved using bayesian optimization to more efficiently search the parameter space, but adding this functionality is out of the scope of this article. In the following cell, we are creating a parameter grid and then training models using all the possible parameter combinations.

Next, we are creating a Pandas dataframe to store the lowest RMSE value from each model training cycle. We can then sort by the validation RMSE value, to get a sense of which parameter combinations worked well. The training RMSE score and epoch where when the validation score was at its lowest are also stored.

This is done to ensure that the model is not overfitting to the validation set.

Looking at the results above, we can see that the first and second rows appear to be overfitting the validation set. On the other hand, the third row shows a similar RMSE score on both the training and validation sets.

In the following cell we are re-entering high scoring model parameters that worked well. We can enable the progress plot to see more information on the model training, and we can make any further changes manually if needed.

We have reduced the RMSE even more! As we improve the model performance it becomes more and more difficult to make improvements. That being said we are looking for progress not perfection and will take improvements where we can.

The forecast can then be plotted in the same way as we did earlier in the article.

Model Performance Comparison

In the next cell, I am going to compare the NeuralProphet model with other common forecasting strategies.

Predict Last Value
Exponential Smoothing
SARIMA
Neural Prophet

We can manually calculate the RMSE value for each model using sklearn. We can simply pass the parameter squared=False to get RMSE from the mean_squared_error function.

Firstly, we can calculate the RMSE if we just predicted the energy demand from the day before.

Next, we can calculate a forecasting model using exponential smoothing. This model type uses the weighted averages of past observations, with weights decaying exponentially as observations get older.

Next, we can fit a SARIMA model to the data. This model acronym stands for “Seasonal Auto-Regressive Integrated Moving Average”, and calculates its forecast exactly how it is named. For information on this model type, check out this great article here.

This model is a little more complex, and we will break the training into code blocks. Firstly, the optimal model parameters are found using autoarima. This is essentially a hyper-parameter tuning package for ARIMA models.

Finally, we can make predictions with the NeuralProphet Model.

Now all the predictions are made, we can compare the RMSE scores on the test dataset.

The last value and exponential smoothing methods yield the highest error, SARIMA achieves the second-lowest error, and NeuralProphet performs the best. I was surprised how close the SARIMA forecast came to NeuralProphet. It would be interesting to take this a step further and see how these models perform on other time series tasks.

Closing Thoughts

Code for this article can be found here.
NeuralProphet is a very intuitive framework that is still in the early stages of development. If you want to contribute to this project, you can do so on Github. You can also join the NeuralProphet community on Slack!
I did not include exogenous variables in this article, but think that these would boost model performance. The predictions from the NeuralProphet model could be used as an input feature to an LGBM/XGBoost model. This would likely yield a very forecast.