The world’s leading publication for data science, AI, and ML professionals.

Wind Energy Trade with Deep Learning – Time Series Forecasting

This article was written by Rosana de Oliveira Gomes on behalf of the Deep Delve team for the AI4Impact Deep Learning Datathon 2020.

Predicting wind energy production with AI

This article was written by Rosana de Oliveira Gomes on behalf of the Deep Delve team for the AI4Impact Deep Learning Datathon 2020.

Source: Unsplash.
Source: Unsplash.

Introduction to the Problem

Nowadays more than ever, the world needs to collaborate in achieving clean energy solutions for fighting the Climate Crisis. The Sustainable Development Goals adopted by all United Nations Member States since 2015 aim to ensure access to affordable, reliable, sustainable, and modern clean energy for all by 2030. Clean energies come from renewable resources supplied by nature, such as sun, wind, tides and waves, geothermal heat, among others. Their use range from electricity generation both in large scale and off grid (for rural and remote areas) [1] to heating/cooling systems and transport. However, renewable sources such as sun and wind energy depend on the weather and are more volatile than traditional sources. As many nations in the world increase their share of renewable energy supplies [2,3], it is important to guarantee that these clean energy sources provide a stable supply while replacing fossil fuel based energies.

In the AI4Impact Deep Learning Datathon 2020, the topic of clean energies was investigated employing Deep Neural Networks. In the first part of the challenge, the teams worked on predicting energy demand using the Kaggle data set containing over 10 years of hourly energy consumption data from PJM Interconnection LLC. You can check some insights from team Deep Delve on this first part of the project in the video below.

The second part of the challenge was dedicated to forecasting wind energies for maximizing trading profits. Together with solar, wind energy is one of the most prominent renewable energies sources, providing 4.8% of worldwide electricity supply in 2018 [4,5] and being responsible for 15% of the electricity consumed in Europe in 2019 [6]. Wind energy is generated by the mechanical power of wind on turbines that generate electricity. Because wind has variable intensity over time and may stop blowing intermittently, electricity produced by this source is commonly combined with other power sources in order to improve reliability and stability.

The economics of wind energy can be understood by considering the trades among different kinds of energy companies. Regional or national energy companies buy pre-determined amounts of energy (measured in kWh) from energy producers, which are companies that operate wind farms (in the case of wind energy). As steady supply of energy is expected from the grid, energy producers can be penalized with substantial fines by governments in the case of power outages.

Energy trade companies play an important role on evaluating the risk of shortfall in energy transactions by helping to predict the expected energy production (specially in the case of wind, as a non-steady energy source). Energy traders predict the production of energy (in our case, wind energy) on behalf of the energy producers, taking into consideration two scenarios:

  • in the case of a shortfall below the prediction, energy is bought on the spot market to supply the grid (with prices above the average energy price)
  • in the case of excess over the forecast production, energy producers are not compensated for the extra energy.

In this sense, precise forecasting of energy production plays a fundamental role for the financial performance of wind farms (i.e. wind energy producers).

Problem Statement

Wind energy is highly dependent on environmental factors such as wind speed. It is critical for energy traders to successfully predict wind energy production in order to maximize profits. By applying Deep Learning to financial risk, we aim to make a wind energy forecast model with a leading time of 18 hours and 1 hour resolution for the Ile de France region. The goal is to implement a model that optimizes profits for wind farms, minimizing excess of shortfalls of energy production.

Methodology

In order to evaluate how well our wind energy model performs, we estimate how much monetary profit the model would achieve when compared to real energy produced. The model is tested using real data over time during the evaluation period, following the trading cycle:

  • Warmup: first 18 hours in which no trades are performed (started on July 22nd 2020, at 00:00 UTC).
  • Trading Period: produce a wind energy forecast for the next 18 hours (T+18) every hour, including weekends, public holidays, 24/7 (started on July 22nd 2020, at 18:00 UTC).
  • Trading ends at the end of the evaluation period (July 28th 2020, at 23:00 UTC).

The analysis of financial performance follows the trade methods discussed before, with the following specifications:

  • Price of energy (kWh): 10 euro cents.
  • Maximum energy (kWh) sold per day: maximum forecast of the day.
  • Energy production excess: energy producer is not compensated for extra energy generated.
  • Energy shortage: buy energy from the spot market (20 euro cents/kWh). The amount of energy available to buy depends on the cash at hand and is only possible when a positive balance is available.
  • Initial cash reserve: 10,000,000 euro cents for buying energy difference shortage case. This amount is returned at the end of the evaluation.
  • Debt (cumulative): in a case of shortage in which the cash available in hand is less than required to purchase, a fine of 100 euro cents per kWh is issued. This is recorded as a negative value and added to the cash-at-hand.

Data set

The goal in this project is to forecast the total wind energy production for the Ile-de-France region surrounding Paris. The data for wind energy production comes from the French energy transmission authority Réseau de transport d’électricité (RTE). The data setenergy-ile-de-france contains near-real time wind production from the RTE’s online database, following these specifications:

  • Energy production (kWh) in time stamps of one 1 hour, starting from January 1st 2017 at 00:00 UTC to present time.
  • Wind forecast data for 8 major wind farms in the region (see table below), from 2 different wind models provided by Terra Weather (16 forecasts).
  • Dependent variables: wind speed (m/s) and wind direction (degrees North – for example, a wind direction of 45 degrees means the wind blows from the northeast). Forecasts are updated every 6 hours, and are interpolated to an 1 hour time base. All values for wind speed and direction are estimated from models.
Table 1: Information about the 8 major wind farms in the Ile-de-France region, which were used in the project.
Table 1: Information about the 8 major wind farms in the Ile-de-France region, which were used in the project.

Exploratory Data Analysis (EDA)

Before jumping into the Deep Learning model, we need to first extract insights from the data and work on data preparation. Data extraction and modeling were implemented with Smojo programming language by Terra AI within the Autocaffe platform used throughout the competition.

Statistics and normalization

We start our analysis by obtaining descriptive statistics of the wind energy in our data set, as shown in Table 2. We emphasize that the data was extracted directly from Autocaffe, and is already interpolated therefore not containing any missing values.

Next step is to normalize the data, in order to make all features have a similar range of values. This step is important to not bias the network towards features with higher values, as well as to speed up the learning process. All features were standardized to zero mean and unit standard deviation, following the equation:

where Xnorm is the value of the normalized input feature, Xmean is the mean value of X and stddev is the standard deviation.

When dealing with forecasting, it is important to define a metric at which one can measure how well the model predictions are in comparison to the real values obtained (actuals). Another important concept when dealing with time series is persistence, which is basically the assumption that the observed values of a quantity in the present are going to be the same in the future (T+X = T+0). The persistence is a trivial forecast model and, therefore, any credible time series model must at least beat the persistence value.

Assuming a Mean Absolute Error (MAE) metric, we identify a persistence of 0.65 for our data set, which is the first benchmark that our model has to overcome.

Table 2: Descriptive statistics for wind energy.
Table 2: Descriptive statistics for wind energy.

Wind Energy Time Series

We start our time series analysis with visualizations of the wind energy production over the years using box plots for Figures 1 and 2. In this visualization, a box corresponds to 50% of the data for each year, the horizontal line inside each box corresponds to the median value, the whiskers (range from box) correspond to the maximum and minimum values (excluding outliers) and finally the circles correspond to outliers (1.5 times the box = distance from the box).

Figure 1: Box plot for wind energy production (normalized) in Ile de France over 2017 and 2019.
Figure 1: Box plot for wind energy production (normalized) in Ile de France over 2017 and 2019.

From Figure 1, it is possible to identify an increase of wind energy production over the years. This general increase can be associated to more investments of the French government on its overall wind energy production. This can have happened through capacity increase in wind farms or the increase in the number of wind farms themselves, as one can see in Table 1 by the installation of wind farms Angerville 1 and 2 in 2019 [7,8].

If we increase the resolution of our analysis over time and focus on the year of 2019, as an example, we can see from Figure 2 that the energy production has a broader range during the months of March and September-December. Similarly, from April to August a decrease in energy production is also identified, indicating a possible seasonal correlation.

Figure 2: Box plot for wind energy production (normalized) in Ile de France over the months of 2019.
Figure 2: Box plot for wind energy production (normalized) in Ile de France over the months of 2019.

Box plots are good for an understanding of the distribution. Now looking into a similar analysis of energy over the months of 2018 in Figure 3, we can better visualize yearly behavior of wind energy production. We highlight the daily averages with orange dots for easier understanding. From the orange curve, one can see that January is the month with highest and August the one with lowest wind energy production. Similar behavior over time is identified in the other years. This information is later used for generating new features for our model optimization.

Figure 3: Monthly mean energy resample
Figure 3: Monthly mean energy resample
Figure 5: Wind speed for Guitrancourt (vertical axis) over January (horizontal axis).
Figure 5: Wind speed for Guitrancourt (vertical axis) over January (horizontal axis).

Wind speed and direction

After identifying some insights about the wind energy production over time, we can now investigate the features available in our data set.

We have the wind speed and wind direction for 8 different wind farms in Ile de France, in a total of initial 16 features.

Figure 5: Wind direction for Guitrancourt (vertical axis) over January (horizontal axis).
Figure 5: Wind direction for Guitrancourt (vertical axis) over January (horizontal axis).

Figures 5 and 6 show the wind speed and wind direction for the Guitrancourt wind farm over the month of January 2019, respectively. No clear insights seem to be visible from the time series behavior of these two features.

We opted for the year of 2019 in this analysis, since it is the year in which all wind farms are available. Similar behavior was identified when investigating the other 7 wind farms in our data set.

Figure 6 shows a zoom in in Figure 3 for the energy production over the same period of January 2019. Looking into regions and the whole time period available in our data, we can now compare the energy production peaks and valley to the behavior for wind speed and direction (Figures 4 and 5). Although any correlation seem to be clear among energy production and wind direction, we see similar patterns over time for energy production and wind speed.

Figure 6: Wind energy production (vertical axis) over days of January 2019 (horizontal axis).
Figure 6: Wind energy production (vertical axis) over days of January 2019 (horizontal axis).

In order to investigate such similarities further, Figure 7 shows the Pearson correlation diagram among all wind speed and direction features for all regions and over the whole time available in our data. In this diagram, higher values (red) indicate stronger correlation, whereas lower values (white) indicate weaker correlations. Results indicate no direct correlation between wind direction and energy, but a clear and strong correlation with speed far all wind farms. Weaker correlations among wind speed and direction for the same farms can be identified, not with a significant correlation score.

Figure 7: Pearson correlation among initial features.
Figure 7: Pearson correlation among initial features.

Finally, we can have a look at the wind features distribution in Figure 8. Histograms of wind speed (left) and direction (right) are shown for 4 wind farms (all wind farms present similar curves). We can see that the distribution of the features are not following a Gaussian distribution, because the peak of wind speed is shifted to the left and the wind direction has 2 peaks. This indicates that using a Mean Absolute Error metric is more suitable for the model evaluation, as is discussed in the next session.

Figure 8: Wind speeds and direction distribution for 4 wind farms in France over 2017 and 2019.
Figure 8: Wind speeds and direction distribution for 4 wind farms in France over 2017 and 2019.

Deep Learning Model

The goal of this project is to make wind energy forecasting for the Ile de France region. Time series forecasting is a common problem solved with Neural Networks and Deep Learning. You can check an overview on tome series models [here](https://otexts.com/fpp2/accuracy.html/) and on the forecasting topic here.

We started our modeling task with a base Neural Network model, later increasing its complexity by adding different features and parameters.

Feature Engineering

Considering that the data is already normalized and set for training and testing, we move on to the feature engineering phase, in which we investigated possible modifications or combinations of our original features to improve model’s performance. We drop single wind direction in specific wind farms as a feature, but search for more insightful features relevant for determining energy production.

Based on the discussion in the EDA session, the final set of features used in the best performing model is the following:

  • Spatial Average Features: wind speed (over all wind farms coordinates), direction (over all wind farms coordinates), wind speed vector and wind direction vectors.
  • Time Features: Number of hours from 6AM UTC, Number of months from January, Number of months from April, Seasons.
  • Weight Nominal Features: current wind speed divided by the nominal speed 12m/s weighted by nominal power output (see Table 1), for all wind farm location and all vectors averaged.

Difference Model

Using a difference model while dealing with time series is extremely important to reduce noise in the data, without changing maximum and minimum values. Moreover, difference helps to capture more insights about the behavior of variables over time. For this reason, we averaged the variables over the following time windows:

  • Last 24, 48, 72 hours,
  • Last week, last bi weeks (14 days),
  • Difference 18hs (trading lead time): t36, t18, t0

Finally, we also investigated the impact of higher order time contributions to the energy production forecasting by introducing momentum and force. Momentum (m) and Force (f) helps giving more information about the behavior of the time rate over time. For a time dependent variable s, an independent variable x, a time t and a time step h the momentum and force are defined as follows:

s(t) = x(t) +x(t-h),

m(t) = s(t) -s(t-h),

f(t) = s(t)-2s(t-h) +s(t-2h).

Momentum and force are important quantities to minimize lag between a model prediction and the actual values. We used this analysis only for wind energy production, given that the other features come from wind models.

Neural Network

After analyzing and pre processing the data, we identified insightful new features for our model in the last session. Now we can finally build our neural network model. First, we already identified from the data behavior that a MAE metric is more suitable, given the non Gaussian behavior of the data. We have also identified several new features to be inputs of our neural network.

When dealing with input features, there is always to risk to add noise to the network or to make the learning process slow in the case the some input features are not relevant for the problem. For this reason, we used input scaling in order to only select relevant features to train our neural network.

Input scaling is a powerful method to reduce noise as well as the effect of outliers in a neural network. The method consists of multiplying the input features by (positive) scaling factors, letting the network learn which ones are relevant as part of the model’s training. After scaling the inputs, the inputs can be further clamped with a tanh of their values.

Another important method for dimensionality reduction is an unsupervised learning technique called autoencoder. This method consists of a subnetwork that compresses and decompresses the input features in a bottleneck network that identifies possible correlations among input features. See more about the topic here.

All the methods mentioned above were investigated under several parametrizations in order to find the best performing configuration for our forecast model. Similarly, we have tested several neural network architectures, trying different number of layer, layer sizes, activation functions, optimizers and dropout regularization probabilities.

The best performance configuration has a score of 0.48, which beats persistence by 26 % for a MAE metric. The specific parameters and architecture used are summarized in Table 3 below (the output layer is not mentioned in the table) and discussed in the next session.

Table 3: Best Performance model.
Table 3: Best Performance model.

Results

Figures 9 and 10 show the training and test losses for the autoencoder and the main neural network, respectively. The results show that although the model is able to beat persistence, the test loss is still high in comparison to training, indicating learning challenges for the network. As learned throughout this project, high test losses can come from a too large network and/or still too much noise in the data. Let’s continue thinking about it….

Figure 9: Training and Test losses (vertical axis) as a function of iteration for autoencoder.
Figure 9: Training and Test losses (vertical axis) as a function of iteration for autoencoder.
Figure 10: Training and Test losses (vertical axis) as a function of iteration for the model.
Figure 10: Training and Test losses (vertical axis) as a function of iteration for the model.

From the loss results one could expect a not particularly great performance of the model. However, if we now look into the results for lagged correlation in Figure 11, we identify very good results. Here, it is important to come back and understand the goal of project: to maximize profit from energy production forecast. As stated in the trading methodology at the beginning of the article, whenever the model does not predict an excess for a shortfall, the traders will lose profit. From this perspective, it makes sense that the model that profits the most is the one that is able to better predict the peaks and valleys in the time series, but not necessary have a high accuracy throughout the whole time series.

Lagged correlations for Train and Test data.
Lagged correlations for Train and Test data.

In this sense, what was done in this optimization analysis was to prioritize lagged correlation over loss in order to maximize profit. Depending on the problem in which time series forecast is employed, the logic may differ, as for example in the case in which outliers are not important.

Finally, we show the Actuals vs Training and Actuals vs Test predictions in Figure 12. A clear linear relation is visible in both cases, indicating a good prediction of the model and that the model is actually learning and not memorizing the training data. As discussed above, the model was optimized to minimize lagged correlation and for this reason a broader linear behavior is identified. This is a direct consequence of how our model was tailored.

Figure 12: Actual data versus Training(left) and Test (right) predictions.
Figure 12: Actual data versus Training(left) and Test (right) predictions.

Summary

In this project, we trained a neural network model for wind energy prediction in France. This model was later applied to maximize the profits of wind energy traders for real data during a period of several days. During the evaluation period, the models of different teams were compared in terms of profit generated.

Important things learned throughout the Datathon include:

  • Domain knowledge is crucial in order to identify relevant features for wind energy forecasting,
  • Detailed data analysis can help with insights on feature engineering and parameter choices,
  • Data preprocessing is crucial for a good performance of the model,
  • Reducing noise is a key component of time series analysis.

The model implemented by team Deep Delve is a neural network which predicts wind energy production and is capable of generate reasonable profits for traders and producers of energy. Possible improvements for the model include more experimentation on parameter tuning and neural network architectures, as well as more advanced Deep Learning techniques such as Recurrent Neural Networks – known for perform well in time series problems. A deeper features engineering analysis can also lead to new important factor for wind energy prediction which were not considered in this project, such as altitude and even climate change effects.

The use of Deep Learning for understanding time series has a high scope and potential for impacting many other fields, such as identifying diseases spread over time (healthcare, economics, sales, stock market, among others). In particular, in the field of renewable energies, it can also be used for forecasting demand and consumption of energy.

Check our short presentation in the link below 🙂


We would like to thank AI4Impact for organizing this insightful Datathon and for providing all competitors the opportunity to learn more about Time Series Forecasting, neural networks and renewable energies.


Deep Delve Team: Ching Nok Yee, Nancy Zhang, Rosana de Oliveira Gomes, Sijuade Oguntayo, Valerie Koh Hui Yi

Further References

[1] Renewable energy technologies. World Energy Assessment (2001)

[2] "12 Countries Leading the Way in Renewable Energy". Click Energy

[3] "Renewable Electricity Capacity And Generation Statistics June 2018". Internet Archive Way Back Machine

[4] Statistical-review-of-world-energy. bp

[5] "Global Installed Capacity in 2018". Global Wind Energy Council

[6] New record-breaking year for Danish wind power. Energinet

[7] This is how Europe is paving the way for a sustainable, green future. We Forum

[8] These countries are leading the transition to sustainable energy. We Forum


Related Articles