Notes from Industry

Introduction
In this article, I will introduce the first of a collection of modules to be developed for analyzing data from operating wind turbines- The Iterative Power Curve Filter.
Theoretically, the power output from a wind turbine (WT) is proportional to the cube of wind speed. A plot of this relationship is known as the power curve and it is perhaps, the most important plot in wind Energy analysis.
Original equipment manufacturers (OEMs) provide a theoretical power curve that maps the input wind speed to output power given ideal conditions. However, this relationship is rarely the case in operational turbines for many reasons such as the wind farm terrain, placement of the anemometer, efficiency issues, and wake effects because of closeness to other turbines.
Therefore, it is of interest to understand the actual relationship between wind speed and power for operating turbines and this is rightly named the operational power curve which can be quite different from the theoretical power curve.
An operational power curve gives the relationship between wind speed and power output for a given turbine during normal operating conditions. Normal operating conditions is often defined as non-downtime, no-fault and no-event data points.
Operational power curves are created by cleaning time series data collected from WTs through hundreds of sensors. The typical frequency at which the data is logged is 0.0017s^-1 which translates to every 10 minutes.
WT sensors are programmed with thresholds to raise alarms when abnormal operating conditions are detected. This is known as the Supervisory Control and Data Acquisition (SCADA) system.
However, simply filtering the SCADA data based on triggered alarms is often insufficient to obtain data points corresponding to normal operating conditions. As a result, statistical filtering is required.
SCADA data filtering is a fundamental part of many Wind Energy analytics projects including turbine underperformance analysis, side-by-side comparison of units, anomaly detection, and data-driven automation of processes using machine learning techniques.
Hence, developing off-the-shelf modules for pre-processing SCADA data will benefit engineers and analysts by significantly reducing the time and effort typically used for data cleaning while lowering the entry barrier for leveraging advanced analytics in the industry.
Filtering procedure

The Iterative Power Curve Filter can handle multiple turbines across several wind farms if there is a unique identifier for each unit. The procedure consists of two main steps as outlined below:
Primary Filtering
- Downtime data points are removed.
- Likely faults are excluded. This step is empirical and the idea is to remove unreasonable production data at moderate to high wind speeds.
Secondary Filtering
- Compute statistical parameters (mean and standard deviation) of the partially filtered power curve from the primary filtering process.
- Exclude data points outside of +/- x std as specified by the user.
- The two steps above are repeated over a few cycles selected by the user.
A related procedure was described in the M.S. thesis of Abdul Mouez Khatab – "Performance analysis of operating wind turbines", 2017.
Module usage
The scada-data-analysis library hosted on PyPi contains the filtering module and details of code implementation can be found on GitHub. The library can be installed by a simple pip command as shown here.
# Pip install library
pip install scada-data-analysis
In addition, the project GitHub repo may be cloned as follows:
# Clone github repo
git clone https://github.com/abbey2017/wind-energy-analytics.git
Using this library, you can filter messy SCADA data in 4 steps as shown below:
# Import relevant libraries
import pandas as pd
from scada_data_analysis.modules.power_curve_preprocessing import PowerCurveFiltering
# Load turbine scada data
df = pd.read_csv('pathtodata')
# Instantiate power curve filtering class
pc_filter = PowerCurveFiltering(turbine_label='Wind_turbine_name', windspeed_label='Ws_avg', power_label='P_avg', data=df, cut_in_speed=3, bin_interval=0.5, z_coeff=2.5, filter_cycle=5, return_fig=True, image_path='..images')
# Process raw scada data
normal_df, abnormal_df = pc_filter.process()
Results
The GitHub repo for this project has example datasets and Jupyter lab notebook usage of the power curve filter. Sample results are shown here based on publicly available SCADA data from the La Haute Borne wind farm in France operated by Engie.


A peek into the future
Results from the power curve filtering module may be used as ground truth for training machine learning models that can be deployed for near real-time monitoring of WT performance or other advanced control technologies.
New modules aimed at enabling engineers and analysts to leverage advanced analytics will be added to the wind energy analytics toolbox. Some current ideas include modules for generating models that can estimate/predict expected power from WTs based on historical SCADA data, faulty equipment detection, and underperformance categorization modules.
I would like to hear your feedback on this article and the wind energy analytics toolkit. Module suggestions including likely applications are welcome. Please send an email to the project maintainer at [email protected] or open an issue on the project’s GitHub page.
What’s more interesting? You can access more enlightening articles from me and other authors unimpeded by subscribing to Medium via my referral link below which also supports my writing.