Stream Learning in Energy IoT Systems

Published in

Towards Data Science

16 min readMar 28, 2020

A Case Study in Combined Cycle Power Plants

Image taken from https://www.pxfuel.com/en/free-photo-ebrro with a license to use Creative Commons Zero — CC0.

The prediction of electrical power produced in combined cycle power plants is a key challenge in the electrical power and energy systems field. This power production can vary depending on environmental variables, such as temperature, pressure, and humidity. Thus, the business problem is how to predict power production as a function of these environmental conditions, in order to maximize the profit. The research community has solved this problem by applying Machine Learning techniques and has managed to reduce the computational and time costs in comparison with the traditional thermodynamical analysis. Until now, this challenge has been tackled from a batch learning perspective, in which data is assumed to be at rest, and where models do not continuously integrate new information into already constructed models. This article presents an approach closer to the Big Data and IoT paradigms, in which data are continuously arriving and where models learn incrementally, achieving significant enhancements in terms of data processing (time, memory and computational costs), and obtaining competitive performances. This article compares and examines the hourly electrical power prediction of several streaming regressors, and discusses the best technique in terms of time processing and predictive performance to be applied to this streaming scenario.

Introduction

The efficiency of Combined Cycle Power Plants (CCPPs) is a key issue in the penetration of this technology in the electricity mix. A recent report [1] has estimated that, in the next decade, the number of projects involving combined cycle technology will increase by a 3.1%, and this estimation is mainly based on the high efficiency of CCPPs. The electrical power production prediction in CCPPs encompasses numerous factors that should be considered to achieve an accurate estimation. The operators of a power grid often predict the power demand based on historical data and environmental factors, such as temperature, pressure, and humidity. Then, they compare these predictions with available resources, such as coal, natural gas, nuclear, solar, wind, or hydropower plants. Power generation technologies (e.g., solar and wind) are highly dependent on environmental conditions, and all generation technologies are subject to planned and unplanned maintenance. Thus, the challenge for a power grid operator is how to handle a shortfall in available resources versus actual demand. The power production of a peaker power plant varies depending on environmental conditions, so the business problem is to predict the power production of the peaker as a function of meteorological conditions — since this would enable the grid operator to make economic trade-offs about the number of peaker plants to turn on (or whether to buy usually expensive power from another grid).

The referred CCPP in this article uses two gas turbines (GT) and one steam turbine (ST) together to produce up to 50% more electricity from the same fuel than a traditional simple-cycle plant. The waste heat from the GTs is routed to the nearby two STs, which generate extra power. In this real environment, a thermodynamical analysis compels thousands of nonlinear equations whose solution is near unfeasible, taking too many computational memory and time costs. This barrier is overcome by using a Machine Learning based approach, which is a frequent alternative instead of thermodynamical approaches [2]. The correct prediction of its electrical power production is very relevant for the efficiency and economic operation of the plant, and maximizes the income from the available megawatt hours. The sustainability and reliability of the GTs depend highly on this electrical power production prediction, mainly when it is subjected to constraints of high profitability and
contractual liabilities.

Our view is close to real system where fast data can be huge, it is in motion, it is closely connected, and where there are limited resources (e.g., time, memory) to process it. While it does not seem appropriate to retrain the learning algorithms every time new instances are available (what occurs in batch processing), a streaming perspective introduces significant enhancements in terms of data processing (less time and computational costs), algorithms training (they are updated every time new instances come), and presents a modernized vision of a CCPP considering it as an IoT application, and as a part of the Industry 4.0 paradigm.

The relevance of an Stream Learning approach

The Big Data paradigm has gained momentum last decade, because of its promise to deliver valuable insights to many real-world applications [3]. With the advent of this emerging paradigm comes not only an increase in the volume of available data, but also the notion of its arrival velocity, that is, these real-world applications generate data in real-time at rates faster than those that can be handled by traditional systems. One particular case of the Big Data paradigm is real-time analytics or Stream Learning (SL), where sequences of items (data streams), possibly infinite, arrive continuously, and where each item has a timestamp and so a temporal order. Data streams arrive one by one, and we would like to build and maintain models (e.g., predictors) of these items in real-time.

This situation leads us to assume that we have to deal with a potentially infinite and ever-growing datasets that may arrive continuously in batches of instances, or instance by instance, in contrast to traditional systems (batch learning) where there is free access to all historical data. These traditional
processing systems assume that data are at rest and simultaneously accessed. For instance, database systems can store large collections of data and allow users to run queries or transactions. The models based on batch processing do not continuously integrate new information into already constructed models but, instead, regularly reconstruct new models from the scratch. However, the incremental learning that is carried out by SL presents advantages for this particular stream processing by continuously incorporating information into its models, and traditionally aim at minimal processing time and space. Because of its ability of continuous large-scale and real-time processing, incremental learning has recently gained more attention in the context of Big Data. SL also presents many new challenges and poses stringent conditions:

only a single sample (or a small batch of instances) is provided to the learning algorithm at every time instant,
a very limited processing time,
a finite amount of memory, and
the necessity of having trained models at every scan of the streams of data.

In addition, these streams of data may evolve over time and may be occasionally affected by a change in their data distribution (concept drift) [4], forcing the system to learn under non-stationary conditions.

We can find many examples of real-world SL applications, such as mobile phones, industrial process controls, intelligent user interfaces, intrusion detection, spam detection, fraud detection, loan recommendation, monitoring and traffic management, among others. In this context, the Internet of Things (IoT) has become one of the main applications of SL [5], since it is producing huge quantity of data continuously in real-time. Therefore, stream data analysis is becoming a standard to extract useful knowledge from what is happening at each moment, allowing people or organizations to react quickly when inconveniences emerge or when new trends appear, helping them increase their performance.

System description

The proposed CCPP is composed of two GTs, one ST and two heat recovery steam generators. In a CCPP, the electricity is generated by GTs and STs, which are combined in one cycle, and is transferred from one turbine to another. The CCPP captures waste heat from the GT to increase efficiency and
the electrical production. Basically, how a CCPP works can be described as follows (see Figure 1):

Gas turbine burns fuel: the GT compresses air and mixes it with fuel that is heated to a very high temperature. The hot air-fuel mixture moves through the GT blades, making them spin. The fast-spinning turbine drives a generator that converts a portion of the spinning energy into electricity
Heat recovery system captures exhaust: a Heat Recovery Steam Generator captures exhaust heat from the GT that would otherwise escape through the exhaust stack. The Heat Recovery Steam Generator creates steam from the GT exhaust heat and delivers it to the ST
Steam turbine delivers additional electricity: the ST sends its energy to the generator drive shaft, where it is converted into additional electricity

This type of CCPP is being installed in an increasing number of plants around the world, where there is access to substantial quantities of natural gas. As it was reported in [3], the proposed CCPP is designed with a nominal generating capacity of 480 megawatts, made up of 2 × 160 megawatts
ABB 13E2 GTs, 2 × dual pressure Heat Recovery Steam Generators and 1 × 160 megawatts ABB ST. GT load is sensitive to the ambient conditions; mainly ambient temperature (AT), atmospheric pressure (AP), and relative humidity (RH). However, ST load is sensitive to the exhaust steam pressure (or vacuum, V). These parameters of both GTs and STs are used as input variables, and the electrical power generating by both GTs and STs is used as a target variable in the dataset of this study. All of them are described in Table 1 and correspond to average hourly data received from the measurement points by the sensors denoted in Figure 1.

**Figure 1**. Layout of the combined cycle power plant based on [6]. HP is High Pressure, LP is Low
Pressure, D is Drum, G is Generator, SH is Super Heater, E is Evapo, EC is Eco, and HRSG is Heat
Recovery Steam Generators. AT, AP, RH, V and PE are the variables described in Table 1. The image belongs to our open access publication in https://www.mdpi.com/1996-1073/13/3/740/htm.

**Table 1.** Input and target variables of the dataset.

Our Stream Learning approach

When designing SL algorithms, we have to take several algorithmic and statistical considerations into account. For example, we have to face the fact that, as we cannot store all the inputs, we cannot unwind a decision made on past data. In batch learning processing we have free access to all historical data gathered during the process, and then we can apply preparatory techniques such as pre-processing, variable selection or statistical analysis to the dataset, among others (see Figure 2). Yet the problem with stream processing is that there is no access to the whole past dataset, and we have to
opt for one of the following strategies.

The first one is to carry out the preparatory techniques every time a new batch of instances or one instance is received, which increments the computational cost and time processing; it may occur that the process flow cannot be stopped to carry out this preparatory process because new instances continue arriving, which can be a challenging task. The second one is to store a first group of instances (preparatory instances) and carry out those preparatory techniques and data stream analysis, applying the conclusions to the incoming instances. This latter case is very common when streaming is applied to a real environment and it has been adopted by this work. This article will show later how the selection of the size of this first group of instances (it might depend on the available memory or the time we can take to collect or process these data) can be crucial to achieve a competitive performance in the rest of the stream.

**Figure 2.** Scheme of the SL process of this work. The image belongs to our open access publication in https://www.mdpi.com/1996-1073/13/3/740/htm.

Once these first instances have been collected, in this article it will be applied three common preparatory techniques before the streaming process starts in order to prepare our Stream Regression algorithms (SRs): variable selection, hyper-parameter tuning, and pre-training.

We train and test our algorithms only with the arriving instance by using a test-then-train evaluation (See Figure 3). Data stream regression is usually evaluated in the on-line setting, which is depicted in Figure 3, and where data is not split into training and testing set. Instead, each model predicts subsequently one instance, which is afterwards used for the construction of the next model.

**Figure 3.** Stream learning (online learning) scheme with test-then-train evaluation. The image belongs to our open access publication in https://www.mdpi.com/1996-1073/13/3/740/htm.

Definition of the problem

The prediction of the electrical power produced is tackled as a regression problem. A SL algorithm, like every Machine Learning method, estimates an unknown dependency between the independent input variables, and a dependent target variable, from a dataset. In our article, SRs predict the electrical power generation of a CCPP from a dataset which consists of couples (xt,yt) (i.e., an instance), and they build a mapping function:

yˆt=(xt,yt)

by using these couples. Their goal is to select the best function that minimizes the error between the actual production (yt) of a system and predicted production (yˆt) based on instances of the dataset (training instances).
This article offers a comparison of stream learners, due to the fact that under these real-time machine learning conditions we need regression methods that learn incrementally. Then, this article identifies the best technique to be applied in the presented scenario, a real-time electrical power production prediction in a CCPP. This article has considered the following SRs:

Passive-Aggressive Regressor (PAR)
Stochastic Gradient Descent Regressor (SGDR)
Multi-Layer Perceptron Regressor (MLPR)
Regression Hoeffding Tree (RHT)
Regression Hoeffding Adaptive Tree (RHAT)
Mondrian Tree Regressor (MTR)
Mondrian Forest Regressor (MFR)

The experiments

In this article it is designed an extensive experimental benchmark in order to find out the most suitable SR method for electrical power prediction in CCPPs, by comparing in terms of error metrics and time processing, 7 widely used SRs. The experimental benchmark has been divided into 4 different experiments (see Table 2) which have considered two preparatory sizes and two variable selection options. The idea is to observe the impact of the number of instances selected for the preparatory phase when the streaming process finalizes, and also to test the relevance of the variable selection process in this streaming scenario. Each experiment has been run 25 times, and the experimental benchmark has followed the scheme depicted in Figure 2.

**Table 2.** The experimental benchmark for the comparison of the SRs.

The results

Data Exploratory Analysis

The input variables (AT, V, AP, RH) affect differently the target variable (PE). Figure 4 shows the correlation between the input and the target variables. On the one hand, we observe how an increase in AT produces a decrease in PE, with a minimal vertical spread of scatter points which indicate a strong inverse relationship between them. The performance reduction due to an increase in temperature is known to stem from the decrease in the density of inlet air.

**Figure 4.** Scatter diagram for visualizing the correlation between features, and the linear regression model fit to the data. The image belongs to our open access publication in https://www.mdpi.com/1996-1073/13/3/740/htm.

On the other hand, we can see how with an increase in V produces a decrease in PE, and it can be also said that there is a strong inverse relationship between them. In this case, the spread is slightly larger than the variable AT, which hints at a slightly weaker relationship. This conclusion is also supported by a correlation value of -0.87 in Figure 5. As it has been seen in Figure 1, the CCPP uses a ST which leads to a considerable increase in total electrical efficiency. And when all other variables remain constant, V is known to have a negative impact on condensing-type turbine efficiency.

**Figure 5.** Heat map for visualizing the correlation between features. The image belongs to our open access publication in https://www.mdpi.com/1996-1073/13/3/740/htm.

In the case of AP and RH, despite PE increases when they increase, Figure 4 depicts a big vertical spread of scatter points, which indicates weak positive relationships that are also confirmed in Figure 5, where 0.52 and 0.39 respectively are shown as the correlation values for these variables. AP is also responsible for the density inlet air, and when all other variables remain constant PE increases with increasing AP. In the case of RH, increases the exhaust-gas temperature of GTs which leads to an increase in the power generated by the ST.

SRs comparison

Tables 3–6 show the error metrics (MSE, RMSE, MAE, R2) and time processing (TIME) in seconds of each SR for the experiments 1, 2, 3 and 4 respectively.

**Table 3.** Results of the experiment 1: variable selection with a 5% of preparatory instances. Note that
RMSE=MAE because all differences are equal.

**Table 4.** Results of the experiment 2: no variable selection with a 5% of preparatory instances. Note
that RMSE=MAE because all differences are equal.

**Table 5**. Results of the experiment 3: variable selection with a 20% of preparatory instances. Note that
RMSE=MAE because all differences are equal.

Table 6. Results of the experiment 4: no variable selection with a 20% of preparatory instances. Note
that RMSE=MAE because all differences are equal.

Now, the article shows in Table 7 the results of the variable selection process in experiments 1 and 3.

**Table 7.** Variable selection results in each experiment. Those selected features are represented with y
(yes), the rest with n (no).

Discussion

The discussion starts by highlighting the relevance of having a representative set of preparatory instances in a SL process. As it was previously introduced, in streaming scenarios it is not possible to access all historical data. Then it is required to apply some strategy to make assumptions for the incoming data, unless a drift occurs (in which case it would be necessary an adaptation to the new distribution). One of these strategies consists of storing the first instances of the stream (preparatory instances) to carry out a set of preparatory techniques that make the streaming algorithms ready for the streaming process. We have opted for this strategy in our work.

Preparatory techniques contribute to improve the performance of the SRs. Theoretically, by selecting a subset of variables/features (variable or feature selection) that contributes most to the prediction variable, we avoid irrelevant or partially relevant features that can negatively impact on the model performance. By selecting the most suitable parameters of algorithms (hyper-parameter tuning), we obtain SRs better adjusted to data. And by training our SRs before the streaming process starts (pre-training), we obtain algorithms ready for the streaming process with better performances. The drawback lies in the fact that as many instances we collect at the beginning of the process, as much time the preparatory techniques will need to be carried out. This is a trade-off that we should have to consider in each scenario, apart from the limits previously mentioned.

Regarding the number of the preparatory instances, as it often occurs with machine learning techniques, the more instances for training (or other purposes) are available, the better the performance of the SRs can be, because data distribution is better represented with more data and the SRs are more
trained and adjusted to the data distribution. But on the other hand, the scenario usually poses limits in terms of memory size, computational capacity, or the moment in which the streaming process has to start, among others. Comparing the experiments 1 and 3 (see Tables 3 and 5 where the selection
process was carried out and the preparatory instances were a 5% and 20% of the dataset respectively) with the experiments 2 and 4 (see Tables 4 and 6 where the variable selection process was not carried out and the preparatory instances were also a 5% and 20% of the dataset respectively), we observe how in almost all cases (except for MTR and MFR when variable selection was carried out) the error metrics improve when the number of preparatory instances is larger. Therefore, by setting aside a group of instances for preparatory purposes, we can generally achieve better results for these stream learners.

In the case of the variable selection process, we deduce from the comparison between Tables 3 and 4 that this preparatory technique improves the performance of RHT and RHAT, and it also reduces their processing time. For PAR, SGDR, and MLPR, it achieves a similar performance but also reduces
their processing time. Thus it is recommendable for all of them, except for MTR and MFR, when the preparatory size is 5%. In the case of the comparison between Tables 5 and 6, this preparatory technique improves the performances of PAR and RHAT, and it also reduces o maintains their processing time. For SGDR, MLPR and RHT the performances and the processing times are very similar. Thus it is also recommendable for all of them, except again for MTR and MFR, when the preparatory size is 20%. In what refers to which features have been selected for the streaming process in the experiments 1 and 3, we see in Table 7 how AT and V have been preferred over the rest by the hyper-parameter tuning method, which has also been confirmed in Section 5.1 due to their correlation with the target variable (PE).

Finally, regarding the selection of the best SR, Tables 3–6 show how MLP and RHT show the best error metrics for both preparatory sizes when the variable selection process is carried out. When there is no variable selection process, then the best error metrics are achieved by MFR. However, in terms
of processing time, SGDR and MTR are the fastest stream learners. Due to the fact that we have to find a balance between error metric results and time processing, we recommend RHT. It is worth mentioning that if we check the performance metrics (MSE, RMSE, MAE, and R 2 ), RHT shows better results
than RHAT, and then we could assume that there are no drift occurrences in the dataset. In case of drifts, RHAT should exhibit better performance metrics than RHT because it has been designed for non-stationary environments.

Conclusions

This work has presented a new approach for this scenario, in which data
are arriving continuously and where regression models have to learn incrementally. This approach is closer to the emerging Big Data and IoT paradigms. The results obtained show how competitive error metrics and processing times have been achieved when applying a SL approach to this specific scenario. Specifically, this work has identified RHT as the most recommendable technique to achieve the electrical power production prediction in the CCPP. The article has also highlighted the relevance of the preparatory techniques to make the streaming algorithms ready for the streaming process, and at the same time, the importance of selecting properly the number of preparatory instances. Regarding the importance of the features, as in previous cases which tackled the same problem from a batch learning perspective, we do recommend to carry out a variable selection
process for all SRs (except for MTR and MFR) because it reduces the streaming processing time and at the same time it is worthy due to the performance gain.

Finally, as future work, we would like to transfer this SL approach to other processes in combined cycle power plants, and even to other kinds of electrical power plants.

Acknowledgements

I would like to thank the rest of the research team (Igor Ballesteros from UPV/EHU, and Izaskun Oregi and Javier Del Ser from TECNALIA, and Sancho Salcedo-Sanz from the University of Alcalá).

References

[1] Black and Veatch. Black and Veatch Strategic Directions: Electric Report; Technical Report; Black and Veatch: Kansas, MO, USA, 2018. Avaliable online: https://www.bv.com/resources/2018-strategic-directions-electric-industry-report (accessed on 28 January 2020)

[2] Kesgin, U.; Heperkan, H. Simulation of thermodynamic systems using soft computing techniques. Int. J. Energy Res. 2005, 29, 581–611.

[3] Zhou, Z.H.; Chawla, N.V.; Jin, Y.; Williams, G.J. Big data opportunities and challenges: Discussions from data analytics perspectives [discussion forum]. IEEE Comput. Intell. Mag. 2014, 9, 62–74.

[4] Lu, J.; Liu, A.; Dong, F.; Gu, F.; Gama, J.; Zhang, G. Learning under Concept Drift: A Review. IEEE Trans. Knowl. Data Eng. 2018, 31, 2346–2363.

[5] De Francisci Morales, G.; Bifet, A.; Khan, L.; Gama, J.; Fan, W. Iot big data stream mining. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 2119–2120.

[6] Tüfekci, P. Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. Int. J. Electr. Power Energy Syst. 2014, 60, 126–140.

Note

*This article is based on the open access scientific manuscript published in the Energies journal in 2020. Can be accessed at https://www.mdpi.com/1996-1073/13/3/740/htm.