Time Series Forecasting of Solar Radiation

How well can we predict solar radiation with one year’s worth of training data?

Suman Gautam
Towards Data Science

--

Image by author

Solar-powered energy generation is on the rise in the United States. According to EIA, the state of Texas is planning to add 10 gigawatts (GW) of utility-scale solar capacity by the end of 2022. This means that there is also an increasing need to power management, both short-term and long-term. In this case, can we leverage solar-related data to make any kind of forecast short-term and long-term help us monitor/manage/distribute electricity generation?

Solar Irradiation is a type of property which is used to measure solar radiation. There are different kinds of measurement: Total Solar Irradiance, Spectral Solar Irradiance, Global Horizontal Irradiance, etc. Measurements are done primarily through satellite or surface sensors. In this study we will use surface sensor measurements.

The National Oceanic and Atmospheric Administration (NOAA) has maintained several measurement sites to record daily solar radiation on the surface of the Earth. These are typical time series data and therefore we can use machine learning to make forecasts of future irradiance. Future forecasts of solar irradiance can be beneficial to estimating how much energy might be generated from solar power generator in days, weeks, or even months. This is useful in managing power distribution from different kinds of sources.

Data

Daily Surface Radiation data from 7 stations around the United States were obtained from NOAA data ftp site:
https://gml.noaa.gov/aftp/data/radiation/surfrad/

The data are available on a daily basis, i.e a total of 365–366 files per year per location. A manual download of this data is a painful task. Below I have made a small script to automate the download based on location and year, in case you are interested in using this data for any analysis:

Data Exploration

The data is available at the interval of ~17s of daily records which will make over >520,000 observations per location per year. If we use it for multiple years, this will become a big dataset. In this study, the data was resampled to hourly and daily averages which reduced the size significantly, thus allowing for a small PC to train on fairly large neural networks. There are over 28 features available with these data. However, for this study we only chose one major feature, ‘netsolar’ radiation, as a univariate time series for forecasting. Solar radiation is a seasonal phenomenon, and hence should be able to be modelled effectively by machine learning algorithms.

Below, we can see the seasonal variation of the solar radiation for 2020 averaged over the 10 days. Obviously, during the summer the radiation will be high and therefore we can observe a hump in the middle of this plot. Besides this, the radiation should also be periodic on the daily basis. And therefore, this study is focused on short-term forecasts (hourly to up to 10 days).

Modelling

My initial interest for this study was to create a model to forecast for the short term—a few hours to maybe up to 10 days. Therefore, I only used one year of data for training purpose (in this case, measurements from 2020) and used that to predict on 2021 data. But we don’t have yet the whole of the 2021 data (as of me writing this article, today is June 16, 2021), so it would make sense to make a forecast to an unknown period beyond this date. For hourly forecasts, the data were resampled to 1 hour averages and for daily forecasts, the data was resampled to 1 day averages.

Our modelling approach started with an application of conventional modelling techniques such as ARIMA and SARIMAX. However, these models did not generalize well in this data. This is perhaps because the one year of data is inadequate. Furthermore, the current data has high fluctuation in the measurements, which might add extra noise to the model. (Something to be tried is to train on a rolling average, which I haven’t done yet!). On the other hand Neural Net-like model architecture such as LSTM seems to perform very well on the time series.

Long Short Term Memory (LSTM)

LSTM are recurrent neural networks that have the ability to store memory from the past. It is for this nature, LSTM networks are very useful in predicting sequential data. The details of LSTM is beyond the scope of this article, but you can find it easily on the internet. In the next section, I will show you a preliminary results of the application of LSTM of radiation time series.

Below is an example of LSTM architecture used for the final modeling. If you are interested in full codes, I provide a GitHub link at the end of this article.

Training Evaluation

Before making any forecast, let’s take a look at how the model prediction performed on the actual dataset. For the current data, our model performed well for short-term prediction. As we increased the lag period, our model starts to perform badly.

Future Forecast

During the whole period of this study, the most challenging part was to figure out how to make a forecast for the unknown period. There are a lot of materials online that compare training and test data for predicted time series, but only few of them show actual forecasts to the unknown. Especially, code that works is very difficult to find, and honestly, I struggled with making my code work for this part. Below is an example of a function I used to make the forecast. If you want to use this code, you will have to use both blocks of code shown below:

Here is an example of what the resulting dataframe looks like:

After this you can use any kind of visualization library to plot the result. I find the plotly library especiall yuseful, because it allows you to zoom in and out, which is very handy for time series data.

Forecast Result

Finally, it’s time to look at the forecasting result. I tested a 24-hour, 10-day, and 30-day forecast with this model. Overall, the 24-hour and 10-day forecasts were reasonably good. For the 30-day forecast, the model didn’t perform well, which is expected as we only used 1 year of data for training purposes.

Prediction Result from Daily Average Data (10 days rolling average)
Zoomed Section
Prediction Result from Hourly Average Data

Conclusions

  • The current study shows promising results on short term forecasting and medium term (up to 10 days) of forecasting. For longer term, the model needs to be trained on more historic data. The script to download multiple years of data will be handy in handling a large amount of files, that would mean I could easily incorporate more historic data to the analysis.
  • The short-term fluctuation in the current data is huge, so would it make sense to use rolling-averaged data for training traditional models like SARIMAX?
  • Finally, there are plenty of extra features on the radiation data, and therefore it would be interesting to see if multivariate forecasting would perform better. That is something to test for the future!

GitHub link:

Linkedin: https://www.linkedin.com/in/suman-gautam-9091572b/

--

--