The world’s leading publication for data science, AI, and ML professionals.

Forecasting Germany’s Solar Energy Production: A Practical Approach with Prophet

Analysis and implementation with Python

Photo by Pixabay: https://www.pexels.com/photo/blue-solar-panel-board-356036/
Photo by Pixabay: https://www.pexels.com/photo/blue-solar-panel-board-356036/

Table of Contents

IntroductionWhy Forecast Solar Power?DataExploratory Data AnalysisWhy Prophet?Evaluation Criteria for ModelsBaseline ModelProphet Model (Default Hyperparameters)Prophet Model (Tuned Hyperparameters)Results and DiscussionFuture StepsConclusionReferences


Introduction

Germany is currently undergoing Energiewende, a long-term transition to a net-zero carbon economy that predominantly utilizes renewable energy resources to generate electricity. Solar power plays a pivotal role in ensuring Germany’s energy security.

Therefore, the success of this transition greatly hinges on the ability to accurately predict future Solar Energy output. This article explores the feasibility of forecasting solar energy generation in Germany using the Prophet Library.


Why Forecast Solar Power?

Forecasting solar power production brings a number of benefits:

  1. Ensuring that supply meets demand

A key challenge in the transition to renewable energy is to ensure that the clean energy sources are able to satisfy the electricity demand at any given time. Forecasting future solar energy production will be essential for supplying sufficient energy while avoiding any shortages or surpluses.

2. Improve storage management

Electricity generated by solar power is usually stored in batteries. Forecasting the generated energy helps optimize the management of the energy generated when the sun is shining, minimizing the amount of energy that is curtailed in the process.

3. Bolster growth in the solar energy market

Accurate forecasts of solar energy help perceive solar energy as a reliable energy source. A consistently performing solar energy market will invite more investment, which will result in more construction and maintenance of solar panels.

4. Protect the environment

Accurate solar energy forecasts are conducive to greater energy security. Establishing solar energy as a reliable renewable energy source will expedite the shift away from carbon-emitting data sources and combating climate change on a whole.


Data

The time series dataset that will be used to carry out the project was provided by Agora Energiewende. Their team offers the Arogamater, which tracks power generation of various sources on an hourly rate.


Exploratory Data Analysis

A thorough EDA will shed light on the nature of the data, identify limitations, and point towards models most suited for forecasting solar energy output with this data.

  1. Preview the dataset:
Preview (Created by Author)
Preview (Created by Author)

2. Search for missing values in the dataset

Missing Values (Created by Author)
Missing Values (Created by Author)

3. Visualize the time series

Time Series (Created by Author)
Time Series (Created by Author)

4. Examine the distribution of values with a box plot

Box Plot (Created by Author)
Box Plot (Created by Author)

5. Examine the distributions by hour

Distributions for Each Hour (Created by Author)
Distributions for Each Hour (Created by Author)

6. Examine the distributions by month

Distributions for Each Month (Created by Author)
Distributions for Each Month (Created by Author)

Key Takeaways from the EDA:

The dataset captures solar power generation from August 1st, 2023 to August 31st, 2024 (1 year and 1 month) at an hourly basis. Overall, the dataset contains 9,528 records. While there are no missing values, the data distribution is skewed to the right as a result of many outliers.

The time series plot shows that the data is very cyclical in nature, a strong indicator of seasonal components. From examining the data distributions by month and hour, it is evident that the amount of solar power generated heavily depends on the month of the year and the time of day.

In terms of time, solar power generation peaks from 10 am to 5 pm and is near-zero from 8 pm to 6 am. In terms of months, the generation is at its highest during summer (June to August) and is at its lowest during winter (November to February). This aligns with our expectations, as sunlight is most available during summer and during the daytime.


Why Prophet?

Prophet is Meta’s own library dedicated to developing forecasting models. It generates predictions based on an additive regression model that performs well even in the presence of missing values and outliers.

Under the hood, Prophet uses Fourier series (i.e., capturing patterns with sines and cosines) to model the seasonal components in a time series, making it capable of accounting for multiple seasonal patterns in a dataset.

Given that the solar power dataset contains outliers and exhibits seasonality at a daily and yearly level, Prophet is suited for this forecasting task. In addition, it is preferable to alternatives like neural networks since they require less time and computational resources for training models.


Evaluation Criteria for Models

The models will be trained with a training set comprising 1 year’s worth of records and evaluated with a test set comprising 1 month’s worth of records.

When forecasting solar power generation, large errors are especially detrimental as they can lead to greater shortages and surpluses of power. Thus, the models will be evaluated based on the root-mean-squared error metric, which heavily penalizes larger errors (compared to other traditional metrics like mean absolute error).

Root Mean Squared Error Formula (Created by Author)
Root Mean Squared Error Formula (Created by Author)

Baseline Model

A baseline model will be created to contextualize the performances of the Prophet models. The baseline model will predict all values to be equivalent to the average power generation in the training dataset.

Baseline Model Forecast (Created by Author)
Baseline Model Forecast (Created by Author)

The baseline model registers a root-mean-squared error of 16.63.


Prophet Model (Default Hyperparameters)

First, we train a Prophet model with the training data using the default hyperparameters to evaluate the performance without any tuning.

This is done by simply instantiating the Prophet model and fitting it to the training dataset. During the training process, Prophet found the parameters that best capture the trends and seasonality of the dataset.

The model’s generated forecast is visualized below:

Default Prophet Model Forecast (Created by Author)
Default Prophet Model Forecast (Created by Author)

The Prophet model with default hyperparameters yields a root-mean-squared error of 7.84.


Prophet Model (Tuned Hyperparameters)

A hyperparameter tuning procedure will help identify the hyperparameter set that best fits the training data, thereby further enhancing the performance of the Prophet model. For this task, the project utilizes the mango library, which is designed to use an informed search to converge to the best hyperparameters in a given search space.

First, the hyperparameter search space is defined:

Next, the objective function is created to compute the root-mean-squared errors for all considered hyperparameters.

Finally, this objective function is run for 50 trials, after which the hyperparameter combination that yields the lowest root-mean-squared error is identified.

The model’s generated forecast is visualized below:

Tuned Prophet Model Forecast (Created by Author)
Tuned Prophet Model Forecast (Created by Author)

The tuned Prophet model registers a root-mean-squared error of 4.18.


Results and Discussion

Forecasts from all Methods (Created by Author)
Forecasts from all Methods (Created by Author)
Results Summary (Created by Author)
Results Summary (Created by Author)

Overall, both Prophet models performed considerably better than the baseline, which failed to capture the nature of the data.

The Prophet model trained on default hyperparameters accurately grasped the seasonality component of the solar power dataset, evident by how the forecasted values rise and drop at the same time as the actual values. However, it grossly underestimates the power generation at peak times and overestimates the power generation at off-peak times. The latter issue is much more egregious, as it means that the model predicts that German solar panels are consistently generating power at night time, which is largely untrue.

In comparison, the tuned Prophet model not only excels at capturing the seasonality components of the dataset but also captures the magnitude of power generation at peak-times and off-peak times. However, the model is unable to match the day-to-day variation in solar power generation that is present in the test data.

The shortcomings of the best-performing model can mainly be attributed to the lack of information directly tied to solar power generation. Without addressing this limitation, it will be difficult to produce a better-performing model.


Future Steps

While the best-performing Prophet model generates accurate predictions with a univariate time series modeling approach, it would register even better results if trained with additional information, which would entail turning this univariate time series problem into a multivariate time series problem. Specifically, two pieces of information are of interest:

1. Weather

Solar energy production is strongly correlated with a number of weather attributes, such as cloud cover (i.e., the proportion of the sky covered by clouds), humidity, and precipitation. Collecting these pieces of information would improve the model’s ability to predict the peak solar power generation for each day.

2. Investments into the Solar Energy Market

Solar energy production currently does not show an upward trend, with the peaks falling within the same range. However, that stands to change as Germany continues to invest in their solar energy market. There is a high likelihood of collecting more energy per day over time. As such, it is essential to capture key information, such as the number of installed solar panels and the capacity of batteries.


Conclusion

Photo by Ann H: https://www.pexels.com/photo/close-up-of-scrabble-tiles-forming-the-words-the-end-2889685/
Photo by Ann H: https://www.pexels.com/photo/close-up-of-scrabble-tiles-forming-the-words-the-end-2889685/

Prophet models have the attributes conducive to capturing the cyclical nature of solar energy production. Their use of Fourier transforms ensures good results with very little computational demand. This sets Prophet apart from other modeling strategies such as neural networks and SARIMA.

That being said, a model is only as powerful as the data it is trained with. Prophet models’ performance will inevitably plateau when only provided with solar energy production data. For optimal performance, including information on weather and solar power infrastructure is vital.

For access to the code used in this project, please visit the GitHub repository:

https://github.com/anair123/Forcasting-Solar-Energy-with-Prophet

Thank you for reading!


References

  1. Agora Energiewende (2023): Agorameter. Model version 3.0, Berlin, 13.09.21

Related Articles