The world’s leading publication for data science, AI, and ML professionals.

Building LSTM based model for solar energy forecasting

Handling some of the Design issues of LSTM

Image Source: https://unsplash.com/photos/mG8sgwkMhCY
Image Source: https://unsplash.com/photos/mG8sgwkMhCY

Solar energy is one of the most important constituents of alternative sources of clean and renewable energy. Forecasting of Solar Energy Generation is critical for downstream application and integration with the conventional power grids. Rather than measuring the photo-voltaic output of the solar cells, often the radiation received from the sun is estimated as a proxy of the solar power generation. The quantity used to measure the same is called Global Horizontal Irradiance (GHI) which includes both the direct radiation as well as the diffused radiation.

The approaches of forecasting can be broadly classified into the following ways

Fig 1: Types of Forecasting Models a) Approach Based b) Forecasting Window Based c) Based on Number of Variables. ( Image Source: Author)
Fig 1: Types of Forecasting Models a) Approach Based b) Forecasting Window Based c) Based on Number of Variables. ( Image Source: Author)

Different models are physical models (NWP), statistical (ARIMA, GARCH), machine learning (Random Forest, Boosting), and Deep Learning-based models (RNN, LSTM, GRU). Based on the forecasting window if it is less than thirty minutes it is very short-term forecasting and if it is more then it is short-term. Short-term forecasting is more relevant right now from the context of India. If you want to get started on LSTM, you should go through the most-read blog.

While we wanted to use LSTM based on the capability of understanding complex and non-linear patterns. We were going through a lot of blogs and papers and we were unsure of few things. We planned to go through them systematically. The data was from 2016 and pertained to three solar power stations from Chennai, Howrah, and Ajmer respectively. For each of the stations, we took data from two seasons namely rainy (Most Volatile because of cloud covers) and winter (Least Volatile). It may also be relevant to point out that we covered "Hot & Humid" and "Hot & Dry" climatic zones respectively. For each of the issues, experiments were carried out and the results were analyzed. LSTM type of models a data with three-dimensional shape as shown in below diagram.

Fig 2a: Input data shape of LSTM type of models. (Image Source: Author)
Fig 2a: Input data shape of LSTM type of models. (Image Source: Author)

Number Input Features: This simply tells, how many variables we are using for the prediction. Instead of using only past GHI values, we could have used other meteorological variables like temperature, humidity, etc.

Number of Timesteps: This is often called the input window size, i.e. how many past values of the series we are using for the prediction. For example, if we intend to predict the value at 10:00 AM, we may consider the value of 09:55 AM, 09:50 AM, 09:45 AM, 09:40 AM, and 09: 35 AM and 09: 30 AM. In this case, the window size is 6.

Batch Size: We understand that for all ML/DL models we try to find optimal parameters of the model using gradient descent. We also know that rather than finding the gradient on the entire training data, it is often more efficient to find the gradient on smaller batches. The batch Size parameter takes care of that.

There is also a concept of output window size based on if we are interested in just the next observation or few subsequent ones as well. This is shown in the below diagram. If only one observation we will have a single output node, else more than one output node.

Fig 2b: LSTM Network with an output window
Fig 2b: LSTM Network with an output window

Design Question 1: One of the benefits of LSTM seems to be no need for time series-related pre-processing like removing trends and seasonality, yet the research community seems to be applying pre-processing. So we wanted to investigate if preprocessing is required or not.

The experiment setup was simple enough. We compared the performance by applying both a) Without preprocessing and b) With preprocessing( Removing Seasonality, there was no trend in the data). The result is given in the below figure.

Fig 3: Comparing Models with raw and preprocessed data ( Image Source: Author)
Fig 3: Comparing Models with raw and preprocessed data ( Image Source: Author)

It was evident, that Lstm trained on Raw time-series gave better results. The performance is measured in terms of normalized RMSE and Explained Variance Score which are standard metrics for a time series prediction task.

Design Question 2: Many practitioners seem to be also using the LSTM in a non-time-series setup i.e. they use previous values of the variable of interest but treat them as different independent variables. Does it make sense?

The experiment setup was easy and trivial. In Setup 1, the data shape is taken as (72, 1,30), 30 features, and a window size of 1. In Setup 2, the data shape is taken as (72, 30, 1), one feature, and a window size of 30.

Fig 4: Comparing models based on not time series and time series setup (Image Source: Author)
Fig 4: Comparing models based on not time series and time series setup (Image Source: Author)

We were really confused seeing quite a lot of blogs as well as research papers treating the time steps as separate features, fortunately, results showed much better results for the time series setup.

Design Question 3: A general intuition is as we increase nodes or layers, the performance should always increase. But with that the training time, as well as the need for training data increases. The question was how true was this intuition and where to stop?

Here, we used a very simple idea. Any learner will need to learn more if they are given more difficult questions. Following the same analogy, any network will need more capacity (In terms of layers, nodes, etc), if the data is more difficult to handle. An obvious follow-through question is how do we measure this? We thought the more variability the input data displays, the more difficult will be the prediction for the same is going to be more difficult. So, we first measured the variability of GHI for each station and then trained and evaluated for the different numbers of nodes. The result is given below.

Fig 5: Should we go on increasing the complexity of the network (Image Source: Authors)
Fig 5: Should we go on increasing the complexity of the network (Image Source: Authors)

For station-season combinations where we have a variability around or less than 50, we needed as many nodes and when it was more than that range, we needed about 100 nodes to reach the optimal prediction. The below graph is a reproduction of the above table.

Fig 6: Number of Nodes and Performance
Fig 6: Number of Nodes and Performance

Conclusion:

LSTM can be a good model for Solar forecasting, it is advised to use the raw time series, they should be treated as time-series data, rather than considering each time step as a separate attribute. More nodes do not necessarily mean the best performance, more complex scenario will need more complex structure and for both simple as well as complex scenarios the learning saturates after a certain number of nodes (Maybe overfitting sets in)

Acknowldgement:

This is the outcome of Sourav Malakar’s work as a Ph.D. Scholar of Calcutta University Data Science Lab ( Setup by A.K.Choudhury School of IT and Department of Statistics). Many rounds of discussions with Prof. Amlan Chakrabarti and Prof. Bhaswati Ganguli, helped a lot. We are also grateful to the National Institute of Wind Energy (NIWE) for its domain knowledge and last but not the least to the LISA (Laboratory of Interdisciplinary Statistical Analysis https://sites.google.com/site/lisa2020network/about) Program which actually made the group more cohesive. In rested readers can read the full article published in Springer Nature Applied Science here.


Related Articles