The world’s leading publication for data science, AI, and ML professionals.

Predicting excess wind electricity in Ireland: Machine Learning against Climate Change! – Part 1

Can Machine Learning algorithms uncover hidden patterns in a complex electricity network for reliable predictions?

Data for Change

Time series predictions between changing consumptions patterns, grid constraints and abruptly changing weather conditions, can be tricky. We are happy to share our experience with a range of ML Algorithms to help us optimize electricity consumptions and reduce our carbon footprint!

Photo by RawFilm on Unsplash
Photo by RawFilm on Unsplash

We’ll explain what problem we were trying to solve, what data we used, explored (EDA) and how we handled missing data, collinearity, outliers and feature transformations to be ready for robust modeling.

Then, we’ll cover Machine Learning / Neural Network model candidates, specific Training / Validation split for a time series with a strong trend, and results comparisons between models and spoiler alert the default EirGrid Forecast!

The problem

Renewable Energy is a fundamental element for Europe’s plan to cut carbon emissions by at least 55% by 2030, compared with 1990 levels. We explore Ireland’s situation and potential smart usage of the wind-generated electricity.

"In Ireland, the growth in energy demand for the next ten years varies between 23% in the low demand scenario, to 47% in the high scenario." according to the Eirgrid All-Island Generation Capacity Statement 2019–2028.

As of 2018, wind energy contributes 80% of renewable electricity and 30% of total electricity demand. Ireland’s ambition is to increase the renewable electricity to 70% of total, and the EU targets 32% by 2030 [1]. However, there is growing industry concern about the amount of wind energy "lost" every year. In 2020 this amounted to more than 1.4 million MWh of electricity, nearly double the figure for 2019. This is just under 11.5 per cent of total production and enough to power more than 300,000 homes, according to the Wind Energy Ireland 2021 report.

But why the "wasted" power?

The Transmission system operator (TSO), EirGrid in Ireland, is responsible to balance the electricity flows from generation to consumers at all times.

Figure 1 - The electricity grid must be balanced at all times between Generation and Demand. Image by author
Figure 1 – The electricity grid must be balanced at all times between Generation and Demand. Image by author

When the electricity generation exceeds the consumption, the TSO levers to adjust are limited:

  • Redirect electricity to "storage": in Ireland, pump water up to Turlough Hill Power Station (but limited)
  • Export (market-permitting) to UK: max. 1 GW Connection (Ewic + Moyle)
  • Ask Gas / Coal Generation Plants to ramp down, however it may take up to a few hours to ramp down
  • The current maximum proportion of Wind / Solar Electricity is constrained by the levels of non- synchronous renewables allowed on the system at any given time is System Non-Synchronous Penetration (SNSP) "its current figure of 65% in Q1 2018" and has been increased to 70% recently.
  • "Renewable Dispatch-Down" (Constraint and Curtailment): which is basically disconnecting wind farms from the grid causing wind energy to be "lost" as can be seen in EirGrid Group System and Renewable Reports.

Large investments in the electricity grid to support a higher rate of renewable energy will be facilitated through the European Green deal, however renewable capacity will massively increase resulting in far more "wasted" electricity.

Consumer and industrial users behavior changes will also be required, which is the focus for this project. As shown in Fig.2, if the wind electricity is predicted to reach the current 70%, then:

  • Industrial users (like Data Centers) can charge batteries for later use.
  • Consumers can program their electric appliances to run over those hours, for example: 1) Charge an electric car; 2) Launch washing machine with tumble drier; 3) Increase heat pump etc.
Figure 2 - www.smartgriddashboard.com Example when wind electricity was likely "wasted"
Figure 2 – www.smartgriddashboard.com Example when wind electricity was likely "wasted"

The data

Figure 3 Data Sources: Supported by EirGrid Group Data and Met Éireann data
Figure 3 Data Sources: Supported by EirGrid Group Data and Met Éireann data
  1. Met Éirean data: Copyright Met Éireann, Source www.met.ie , Licence Statement: This data is published under a Creative Commons Attribution 4.0 International (CC BY 4.0).
  2. EirGrid Group Data: Supported by EirGrid Group Data, Source: www.smartgriddashboard.com , Open Data Licence.

As shown in Fig.3, a wind power dataset with 145,936 observations spans across Jan 2017 – Feb 2021 is downloaded from EirGrid Group for the island of Ireland, as Republic and Northern Ireland are together as one Integrated Single Electricity Market (I-SEM). The data depicts Wind-generated electricity and electricity demand from samples with a frequency of 15 minutes. To build the complete picture, the total Wind Capacity installed in the island of Ireland which is being reported monthly in the "[System and renewable data summary report," Eirgrid Group, Tech. Rep., 2020](https://www.eirgridgroup.com/site-files/ library/EirGrid/System- and- Renewable- Data- Summary- Report.xlsx).

The historical weather information downloaded from Met Éirean depicts hourly weather (37,225 rows) from each of four meteorological stations located in Shannon Airport, Dublin Airport, Cork Airport and Belmullet, as many grid-connected wind farms are located close by, as well as Dublin as a major population centre for the electricity consumption impact. Furthermore, in a later phase of the work, the predictions of the proposed models, even the best models underestimated the Wind Electricity generation when wind was low in Dublin. We realized that wind speeds were high in the North of Ireland, where we didn’t have specific weather stations data. Hence, the weather data from Malin Head station is also selected into the weather dataset.

Data quality, missing data and outliers

Overall, the data quality from both sources is excellent for the last 3 years.

In the Eirgrid data set, 66 rows of 15 min periods were Missing Completely at Random (MCAR) and were thus Backfilled.

In the historical Met Éirean, a chunk of data was missing from the start of 2017, so the whole dataset was reduced to start only on July 1st, 2017 without impact on models.

Looking at outliers on temperature and wind data, we found they were consistent with Irish short-term extreme temperature (so extremely rare above 30 degrees!) and storms (more frequent, good for wind!).

Figure 4— Met outliers analysis
Figure 4— Met outliers analysis

We were surprised with some negative values for wind energy, but found Cyclical aerodynamic loads on the turbine blades produced a negative impact on the wind turbine, mainly due to the enhanced wind shear.

Figure 5 - Electricity Demand and Generation outliers
Figure 5 – Electricity Demand and Generation outliers

Outlier Control charts also provided insights in trends in electricity generation and demand, in particular for the seasonality and the rising amount of wind electricity generation.

Figure 6 -Actual Wind Electricity Generation Control Chart
Figure 6 -Actual Wind Electricity Generation Control Chart

To plot Control Charts to help spotting univariate outliers, this code is very handy:

SEAI Monthly generation data was also cross-checked against the Republic of Ireland 15-min data to confirm overall quality.

Collinearity handling

The intuition here is that the total possible production of electricity depends closely on the weather conditions close to the main wind farms, in particular as was found in [2], [3] and [4]: wind speed, wind direction, relative humidity and mean sea level air pressure (in hectopascal). Conversely, the electricity consumption depends on the hour of the day, business vs. weekend days but also on the air temperature, as seen in [6], [7] and [8].

However, since the weather data from multiple stations is required to have the full view, a lot of the measures will be correlated.

Data collinearity is likely to reduce model performance, as well as obfuscate features impact and should be prevented whenever possible.

We removed very highly correlated features (above 0.9) and high Variance Inflation Factor (VIF), for example temperature in various weather stations, resulting in a more manageable dataset:

Figure 7 - Main Time and weather features correlations
Figure 7 – Main Time and weather features correlations

To check multi-collinearity, best is to use the variance_inflation_factor. A rule of thumb is if any VIF is greater than 10, then you really need to consider dropping variables from your model.


For detailed implementation, see the relevant Colab files and documents are in the ReadMe in Github.

Features transformations

Transform time into 2D

From the Fast Fourier Transform of the temperature, wind speed and actual wind power shown in Fig.8, we can see there are obvious peaks at the day−1 and year−1 frequency components, which means the data have some potential daily and yearly patterns.

Figure 8— Weather features Fast Fourier Transform
Figure 8— Weather features Fast Fourier Transform

In order to emphasize these patterns in our models, we need to convert the 1D observation timestamps into a 2d periodic radian time space (Fig.9) as suggested in [9].

Figure 9: Date / Time transformations
Figure 9: Date / Time transformations

Here we transform the time into two radian time spaces: one for the yearly period [yearSin, yearCos] one for the daily period [daySin, dayCos], which are derived by:

Time transformation
Time transformation

2D wind vector

As shown in Fig.10, the wind direction is recorded in degrees, which does not make good model inputs, as 360° and 0° should be close to each other, and wrap around smoothly. Also, the wind direction has no effect on the model if the wind speed is high. Therefore, it is more sensible to combine the wind speed and direction to create a 2D wind vector feature [windSin, windCos].

Fig. 10. Transform wind speed and direction into 2D wind vector
Fig. 10. Transform wind speed and direction into 2D wind vector

As mentioned, look out for part 2 which will cover model candidates, specific Training / Validation split for a time series with a strong trend and results!


Goal reminder!

Amongst other things, in order to achieve Europe’s plan to cut carbon emissions by at least 55% by 2030, Consumer and Industrial electricity users behavior changes will be required, which is the focus for this project. As shown in Fig.1, if the wind electricity is predicted to reach the current 70%, then:

  • Industrial users (like Data Centers) can charge batteries for later use.
  • Consumers can program their electric appliances to run over those hours, for example: 1) Charge an electric car; 2) Launch washing machine with tumble drier; 3) Increase heat pump etc.
Fig. 11 Very windy day forecast! - best times to charge batteries between 1 and 4 AM / around 3 PM
Fig. 11 Very windy day forecast! – best times to charge batteries between 1 and 4 AM / around 3 PM

With this goal the model predictions success will be measured mainly by:

  • The main relevant metric for this work is the Mean Absolute Error (MAE), as absolute values are what we are trying to measure in order to recommend when to charge batteries.
  • Exact predictions are most important when the proportion of actual wind generation is high, as when wind is low the electricity Carbon Intensity will be bad anyway (other renewables like Solar and Hydro have a low impact currently in Ireland)
  • The Root Mean Squared Error (RMSE) and explained variance regression score are also measured from the models for better understanding of the model limitations.

Training / Validation Split for a time series with a strong upwards trend

The last 2 weeks of March 2021 in the dataset is reserved as the Test set, and the rest of the dataset is split into Training and Validation sets. The standard random split using scikit-learn provides excellent validation results but very poor in the Test results. This is because, for time series data, the models typically predict a value close to the last/next value. With a randomly shuffled set, this value will typically be very close to the actual one, there is, in effect, Data Leakage.

A standard way to split Training / Validation sets in Time Series is to simply split the data at a date roughly at the 80% mark (as recommended in [10] ). However, as shown in Part 1, there is a continuous upwards trend in the target variable, so results on the Test set for the most recent data are poor.

For this reason, the dataset is split on a particular day (the 22nd) of each month, so that the Training set includes all the dates to the 22nd of the month and the Validation set dates above the 22nd of the month, thus preserving data in all years (for trend) and month (for seasonality). For a given high- performing model and feature set (Random Forest model and 2DTime), results on the Test set are significantly better with the custom Training-Validation split.

Input Features Set

In order to explore the influence of each input feature on the models, the models are trained and tested on different sets of features (Table I). The impact of each input feature is examined by comparing the results from different input sets.

Table I - features sets combinations
Table I – features sets combinations

The sets which include the Actual Wind (MW) from the previous 24H were inspired by the Medium Predict daily electric consumption with neural networks article [11] which covered related requirements.

Models candidates

Random Forest

We selected the Random Forest Regression model as the prototype for our early analysis to get an idea of features which make a significant difference, it also handles linear and non-linear relationships quite well as well as bias vs. variance balance. The research for power predictions in [12] also states that they use such models. The default Random Forest parameters lead to fully grown and unpruned trees which can potentially be very large. In this case, the results were very good and time to train was under a few minutes, so they were fine. Note, standard SkLearn GridSearch implementations can be difficult to use with time series because of the possible "data leakage" of close-by hours in nested cross-validation as pointed out above. Best results were found on the "Rhum_Msl" features set which includes the standard wind speeds as well as the Relative Humidity and Sea-level Pressure data.

Random Forest Regression Evaluation for the last 2 weeks of data reserved for Testing (March 15th to 29th, 2021) ⇒ Mean Absolute Error (MAE): 219 . As shown in Fig.2, the validation errors depict a roughly uniform distribution, apart from a few outliers. The period highlighted in the green box is around the 1st lockdown in April 2020 and understandably patterns (mostly in energy demand) changed dramatically at that point.

Figure 12 -Random Forest error on validation set
Figure 12 -Random Forest error on validation set

As shown in Fig. 13, the predictions based on the Met Eireann historical data follow closely the actual Wind generation values on the Test set. Note the Eirgrid own Forecast for wind generation (Eirgrid Forecast Wind) tends to overshoot the actual generation when the demand is relatively low. Conversely, predictions from the proposed RF model are more accurate and effectively match the fact that the grid can cope with a maximum ratio of wind electricity.

Figure 13 - Random Forest Predictions vs. Actual Wind generation (MW)
Figure 13 – Random Forest Predictions vs. Actual Wind generation (MW)

Features importance (Fig. 14) according to the Random Forest model have to be taken into account carefully, mostly as there are a number of residual collinearity between weather stations measures, but they give an idea of the features important to the model.

Wind speed in Shannon (wdsp) in knots, as well as wind speeds in Malin Head (wdsp MAL), Cork (wdsp COR) and Belmullet (wdsp BEL) are of course key in predicting the overall wind generation as most wind farms are in those areas. The total wind power capacity in the island of Ireland (TotalWindCapacityMW) has increased year on year and is a major factor too. Day in year and Hour matter for weather patterns and demand seasonality. The current Temperature in Dublin (temp DUB) is also important, presumably because it impacts demand.

Figure 14 - Features Importance
Figure 14 – Features Importance

Artificial Neural Network – hourly model

The main advantage of ANN models is their self-learning capacity to determine complex relations among variables while keeping high data tolerance. However, in order to achieve accurate prediction, the self-learning processes of ANNs require large amounts of data and the corresponding high cost of computation. Thanks to the explosive growth of available data and computation power, ANN models have been successfully used for modeling non-linear problems and complex systems in forecasting wind power generation and energy consumption [13], [14], [15].

Therefore, this project also employs the neural network method to compare to other models. The ANN model in this work is built using the Keras library of Tensorflow. There are different versions of the ANN model corresponding to the feature sets shown in Table I. All versions are experimented with different model settings ranging from 2 to 5 dense layers with neurons ranging from 20 to 260 neurons for each layer. According to the results of the experiments, the ANN model is settled with 3 layers with 120 neutrons and a final layer with 10 neurons. The model uses Adam optimizer and rectified linear (ReLU) activation function as ReLU outperforms other functions (such as Softplus, Sigmoid and Hyperbolic functions) in this project.

In Fig. 15, the training and testing results for the different feature sets suggest: 1) The 2D-time feature yields better performance, however the wind vector is not as expected; a) The ANN model for ‘time & rhum’ dataset is chosen for the later evaluation and comparison.

Fig 5 - Training and Validation Loss (MAE) per feature set
Fig 5 – Training and Validation Loss (MAE) per feature set

Long Short-Term Memory Model

The LSTM network is sensible for this project due to the ability of learning both short-term and longer-term seasonal patterns off the weather observations. This work implements Recurrent Neural Network (RNN) Models based on the "Tensorflow core tutorial: Time series forecasting" to predict Ireland wind electricity generation in the next 24 hour (Fig.16), which implements:

  • A LSTM where the model makes the entire sequence prediction in a single step.
  • An Autoregressive LSTM which decomposes this prediction into individual time steps. Then each output can be fed back into itself at each step and predictions can be made conditioned on the previous one, like in the classic Generating Sequences With RNNs [16].

Both models use a 24-hour window of previous weather values and actual wind power as input, however they don’t use the current weather forecast for the next 24 hours. As a result their performance is suboptimal.

Fig. 6. Examples of 24h LSTM Predictions vs. Actual Wind generation
Fig. 6. Examples of 24h LSTM Predictions vs. Actual Wind generation

Artificial Neural Network – 24 H model

As a result of the findings above, we tried a Neural Network again but based on a single-shot prediction of the whole 24 H, similar to the LSTM above.

The intuition is that wind electricity generation will not only depend on the current winds blowing across Ireland but also on what happened in the hours before. For example, if a gas-fired power station is up and running at a high point and winds start to pick up, as the power station may take a few hours to wind down, wind electricity generation will be "dispatched down" for a little while.

Beside, the Wind generation level immediately preceding the 24 H window may inform the model too, thus a new features set will also include this data.

The new ANN model is also built using the Keras library of Tensorflow, and takes as in input the aggregated 24 H for the required N features and consists of 5 layers of N * 24 neurons, followed by 2 layers to flatten to a vector of 24 H predictions.

Similarly to the hourly ANN models, the training and testing results for the different feature sets suggest: 1) The 2D-time feature yields better performance, however the wind vector is not as expected; a) The ANN model for ‘time & rhum & prev actual’ dataset is chosen for the later evaluation and comparison.

Fig . 17 - Training and Validation Loss (MAE) per features set
Fig . 17 – Training and Validation Loss (MAE) per features set

Drum roll! The results!

The AI models proposed in this work are evaluated and compared using MAE over the Test set (last 2 weeks of March 2021) for the best features set per model. The predictions of the models are also compared to the benchmark of the work which is the wind energy generation forecasted by EirGrid. As shown in Fig. 18, both the Random Forest and ANN models provide higher accuracy (lower MAE) than EirGrid’s. However, the performance of the LSTM model is the worst. This is because the current LSTM is solely based on the historical data of wind energy generation, and is expected to have improved performance when the weather features are incorporated in future work.

Fig. 18 - Test set MAE for each model type (best feature set)
Fig. 18 – Test set MAE for each model type (best feature set)

But wait, are those last 2 weeks of March 2021, totally new data for the models as required for a Test set, representative of future performance? We can compare the Validation set to get an idea.

Fig 9 - Validation Set MAE
Fig 9 – Validation Set MAE

Uh-oh, the results here are not as dramatic, though great to see that the 24H ANN Model still performs best. Why could that be? Is there any pattern to the error which we should be aware of?

In fact, there is, if we look at Errors (Predicted value – Actual Value) vs. the Actual Value :

Fig. 20 - Validation Set EirGrid Forecast error by ActualWindMW
Fig. 20 – Validation Set EirGrid Forecast error by ActualWindMW

As we have seen in some examples, the forecast provided by EirGrid tends to overestimate Wind generation when there is a lot of wind and doesn’t seem to take into account the Grid SNSP constraints: we can see above that the error is positively correlated with the Actual value. This is true in particular for yellow points (2021) and orange points (2020) where more Wind Capacity was available, as well as a higher ratio of SNSP support in the grid in 2021.

Fig. 21 - Validation Set 24H ANN error by ActualWindMW
Fig. 21 – Validation Set 24H ANN error by ActualWindMW

On the other end, our 24 Hour ANN Model tends to underestimate slightly at lower actual values. The range of errors in general is smaller too.

As the MAE scores are pretty similar, it’s also worth checking the Explained Variance Score, here we can see on this score the EirGrid forecast performance is poorer.

Fig. 12 - Validation Set Explained Variance Score per model - closer to 1 is better
Fig. 12 – Validation Set Explained Variance Score per model – closer to 1 is better

Let’s have a closer look at those last 2 weeks of March:

Fig.23 - Comparing 24h ANN Predictions, Eirgrid Forecast and Wind Actual
Fig.23 – Comparing 24h ANN Predictions, Eirgrid Forecast and Wind Actual

The predictions are all very good during the 1st week, when there is little wind and little wind generations.

When the wind reaches the maximum capacity of the grid (about 70% of the actual Demand at the time), the EirGrid forecast significantly overshoots, while our best model only slightly underestimates.

Our chosen Machine Learning models, both Neural Networks and Random Forest, are thus able to discover hidden patterns of electricity generation and demand from a few simple weather stations measures, hour and day of the year and connected wind farm capacity.

Our model is ready for production! :=)

Next steps

For production, maybe a simple Website with predictions updates every evening possibly, for users to check nightly, a small budget should be sufficient.

However, for maintenance and to keep the model re-trained regularly would take more effort. A number of Wind farms, as well as Solar production is under way as a result of the first Renewable Electricity Support Scheme (RESS) auction.

With more time, we’d like to continue investigation in the LSTM Model with a more suitable features set.

Furthermore, as shown here, features selection and transformation have a great impact on the modeling performance. This means further feature engineering research can also improve the prediction of the models. One of the improvements is likely to be found from Fast Fourier Transform and/or Wavelet Transform, which can illustrate seasonal patterns of the features in the frequency domain.


Thank you!

We’d like to thank you for reading so far! We’d also like to thank our Data Mining professor in Dublin City University, Dr Andrew Mccarren, for his clear teaching in managing data projects and his feedback on our initial project, as well as Kevin McElwee for the 24H model inspiration.

Authors

Kangyu Pan, Krystian Matusz, Catherine Lalanne

References

[1] "All island generation capacity statement 2020- 2028," Eirgrid Group, Tech. Rep., 2020. [[Online](https://www.eirgridgroup.com/site-files/library/EirGrid/ All- Island- Generation- Capacity- Statement- 2020- 2029.pdf)]

[2] J. Haslett and A. E. Raftery, "Space-time modelling with long-memory dependence: Assessing ireland’s wind power resource," Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 38, no. 1, pp

[3] T.Brahimi,"Using artificial intelligence to predict wind speed for energy application in saudi arabia," Energies, vol. 12, p. 4669, 12 2019.

[4] K.P.Moustris,D.Zafirakis,D.H.Alamo,R.J.NebotMedina,andJ.K. Kaldellis, "24-h ahead wind speed prediction for the optimum operation of hybrid power stations with the use of artificial neural networks," in Perspectives on Atmospheric Sciences, T. Karacostas, A. Bais, and P. T. Nastos, Eds. Cham: Springer International Publishing, 2017, pp. 409– 414.

[5] A. Lahouar and J. Ben Hadj Slama, "Hour-ahead wind power forecast based on random forests," Renewable Energy, vol. 109, pp. –, 03 2017.

[6] C. S. Chua, Z. Li, M. H. Lin, and J. Y. Quah, "Predicting energy demand with neural networks,"

  1. [Online]. Available: https://towardsdatascience.com/ forecasting- energy- consumption- using- neural- networks- xgboost- 2032b6e6f7e2

[7] P. W. Khan, Y.-C. Byun, S.-J. Lee, D.-H. Kang, J.-Y. Kang, and H.-S. Park, "Machine learning-based approach to predict energy consumption of renewable and nonrenewable power sources," Energies, vol. 13, no. 18, 2020. [Online]. Available: https://www.mdpi.com/1996-1073/ 13/18/4870

[8] R. Gramillano, "Predicting electricity demand in la,"

  1. [Online]. Available: https://towardsdatascience.com/ predicting- electricity- demand- in- la- outperforming- the- government- a0921463fde8

[9] Moon, J, Park, J, Hwang, E, et al. Forecasting power consumption for higher educational institutions based on machine learning. J Supercomput 2018; 74: 3778–3800.

[10] scikit-learn Time-series Split [Online]

[11] Kevin McElwee, "Predict daily electric consumption with neural networks." 2020. [Online].

[12] V. Natarajan and N. Kumari, Wind Power Forecasting Using Parallel Random Forest Algorithm. [Springer, Singapore], 01 2020, vol. 1048, pp. 209–224. [Online]. Available: https://doi.org/10.1007/978-981-15-0035-0 16

[13] A. S. Qureshi and A. Khan, "Adaptive transfer learning in deep neural networks: Wind power prediction using knowledge transfer from region to region and between different task domains," Computational Intelli- gence, vol. 35, pp. 1088–1112, 2019.

[14] D. Widodo, N. Iksan, E. Udayanti, and Djuniadi, "Renewable energy power generation forecasting using deep learning method," IOP Confer- ence Series: Earth and Environmental Science, vol. 700, p. 012026, 03 2021.

[15] P. W. Khan, Y.-C. Byun, S.-J. Lee, D.-H. Kang, J.-Y. Kang, and H.-S. Park, "Machine learning-based approach to predict energy consumption of renewable and nonrenewable power sources," Energies, vol. 13, no. 18, 2020. [Online]. Available: https://www.mdpi.com/1996-1073/ 13/18/4870

[16] A. Graves, "Generating sequences with recurrent neural networks," 2014.

GitHub Link to the Notebooks

CA683-Group99/Wind-Energy-Prediction


Related Articles