Introduction
Have you ever gotten awesome results on your test set only to have your models perform poorly in production after some time? If so, you might be experiencing model decay. Model decay is the gradual decline in the performance of a Machine Learning model over time. In this article we will be discussing about how data drift causes model decay and how we setup early detection for drift.

What is drift in machine learning?
In machine learning, model drift refers to a change in the underlying distribution of the data that a model has been trained on, leading to a decrease in its performance on new, unseen data. This can occur when a model is deployed in a real-world setting and the distribution of data it encounters changes over time. For example, a model trained on pre-covid data may not perform as well on data during the Covid19 pandemic due to changes in the underlying distribution of the data.
Why is it important to track model drift?
Tracking drift is important because it can help ensure that a machine learning model continues to make accurate predictions over time. As the model is deployed in the real-world, the distribution of data it encounters may change, which can cause the model’s performance to degrade. By tracking drift, we can detect when this occurs and take appropriate action to adapt the model, such as retraining it on new data. This can help prevent the model from making increasingly inaccurate predictions, which can have serious consequences in certain applications such as fraud detection, credit scoring and medical diagnosis. Tracking model drift is also important for compliance and regulatory reasons. Organizations might be required to maintain accurate records and have an auditable trail of their model’s performance over time.
Types of drift
Here the different types of drift that might affect your models.
- Concept drift is the change in relationship between the independent and target variable. This occurs when the underlying concept or task that the model is trying to learn changes over time. For example, a model trained to detect fraudulent credit card transactions may experience concept drift if the type of fraud changes over time.
- Covariate shift is the shift in independent variable. This occurs when the distribution of the input variables changes over time, but the underlying concept or task remains the same. For example, a model trained on data from one geographic location may experience covariate shift if it is deployed in a different location with different distribution of input variables.
- Prior probability shift is the shift in target variable. For example, a model trained on a dataset where the classes are balanced may experience prior probability shift if it is deployed on a dataset where one class is much more prevalent than the other.
Methods to detect drift
Here are some common ways to detect model decay and drift.
Monitor model performance
This involves calculating metrics such as MSE, RMSE for regression models and AUC ROC, Accuracy, Precision, Recall and F1 for classification. Large deviation between production and test performance can raise an alarm on potential drift happening.
However this method might be impractical in situations where there is a large time gap between the time of prediction and obtaining the ground truth. For example, in a bank telemarketing campaign where a machine learning model predicts customer’s propensity to buy a particular product. The campaign might last for few months and we can only conclude if the customer make a purchase during the campaign at the end of the campaign period. If we rely solely on model performance as an indicator of model drift, we will only be able to get an alert on drift at the end of the campaign.
While model performance is a useful indicator, it is a lagging indicator. We can take a proactive approach in detecting drift by monitoring the input features.
Monitor changes in input features
A simple way to monitor changes in input features is through descriptive statistics. Descriptive statistics are numbers used to provide a summary about a set of data. Common descriptive statistics for numerical values are mean, median, mode, minimum and maximum. A change in descriptive statistics can raise an alert on potential drift.
We can also monitor changes to the distributions of input features. Common statistics test used to monitor changes in distributions are Kolmogorov-Smirnov test, Population Stability Index (PSI), Wasserstein distance also known as the Earth-Mover Distance, Kullback-Leibler divergence and Jensen-Shannon distance.
In this article we will walk through an example on how to use Evidently, a model monitoring tool in python that leverages on various statistical tests, to detect drift in machine learning models.
Example
In the following example, we will:
- Train a model to predict the housing resale price
- Use Evidently AI’s pre-built reports to monitor the fitted model
Setup
- Visual Studio Code
- Python 3.8
- Python packages required
evidently==0.2.2
scikit-learn==1.1.2
pandas==1.4.3
Get the data
We will be using a subset of the Singapore resale housing price dataset[1]. The dataset provided by the Housing Development Board shows the transactions of resale houses. It includes information such as year-month of the transaction, flat type, location, size of the flat and the resale price.
import pandas as pd
import re
df = pd.read_csv('path/to/data/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv')
def convert_to_years(x):
str_split = x.split(' ')
years = int(str_split[0])
if len(str_split) == 4:
months = int(x.split(' ')[2])
total_years = round((years*12 + months)/12,2)
else:
total_years = years
return total_years
df['year_month'] = pd.to_datetime(df['month'])
df['year'] = df['year_month'].dt.year
df['month'] = df['year_month'].dt.month
df = df.drop(columns = ['block', 'street_name'])
df['remaining_lease'] = df['remaining_lease'].apply(convert_to_years)
df = df.rename(columns = {'resale_price':'target'})
We perform the following pre-processing steps:
- Create date features
year
andmonth
- Converted
remaining_lease
column from string to float type - rename the
resale_price
column totarget
.
Let’s split the data into 3 sets based on the transaction date.
Train
: This is the set which we use for training, it contains data from 2020. The labels in this set is known to us.Test
: This is the holdout set which we use to get the test result. It contains data from 2021. The labels in this set is known to us.Score
:This is the set of unseen records used for scoring in production. We should not have the labels for this set, therefore we will be dropping thetarget
column to simulate a real world case. It contains data from 2022.
# split data
df_train = df.loc[(df['year_month'] >= '2020-01-01') & (df['year_month'] < '2021-01-01')].drop(columns = ['year_month']).sample(n=10000)
df_test = df.loc[(df['year_month'] >= '2021-01-01') & (df['year_month'] < '2022-01-01')].drop(columns = ['year_month']).sample(n=5000)
df_score = df.loc[df['year_month'] >= '2022-01-01'].drop(columns = ['year_month', 'target']).sample(n=5000)
y_train = df_train['target'].copy()
X_train = df_train.drop(columns='target').copy()
y_test = df_test['target'].copy()
X_test = df_test.drop(columns='target').copy()
X_score = df_score.copy()
Train Model
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
categorical_features = ['town', 'flat_type', 'storey_range', 'flat_model']
categorical_transformer = Pipeline(steps=[('encoder', OneHotEncoder(handle_unknown='ignore'))])
numerical_features = ['floor_area_sqm', 'lease_commence_date', 'remaining_lease']
numerical_transformer = Pipeline(steps=[('impute', SimpleImputer())])
preprocessor = ColumnTransformer(
transformers=[
('cat', categorical_transformer, categorical_features),
('num', numerical_transformer, numerical_features)])
gbr = GradientBoostingRegressor()
regressor = Pipeline([('processing', preprocessor), ('regr', gbr)])
regressor.fit(X_train, y_train)
Prediction
We use the trained regression model to predict on the test, score and train set. Note that at this point we only have the target columns for train and test set, not the score set.
df_test['prediction'] = regressor.predict(X_test)
df_score['prediction'] = regressor.predict(X_score)
df_train['prediction'] = regressor.predict(X_train)
Pre-built Reports
Evidently AI comes with a wide range of pre-built metrics and tests known as metric and test preset. These are groups of relevant metrics or tests presented to you in a single report.
Below are some metric presets:
DataQualityPreset
: Evaluate data quality and provides descriptive statisticsDataDriftPreset
: Evaluates data drift in individual columns and the datasetTargetDriftPreset
: Evaluates prediction or target driftRegressionPreset
: Evaluates quality of a regression modelClassificationPreset
: Evaluates quality of a classification model
Below are some test presets:
NoTargetPerformanceTestPreset
: Evaluate data drift in the prediction the prediction column and data quality check across all columns.DataDriftTestPreset
: Evaluates data drift in individual columns and the datasetDataQualityTestPreset
: Evaluate data quality and provides descriptive statistics
Pre-built Metric
Let’s examine how the metrics preset work.
report = Report(metrics=[
DataDriftPreset(drift_share=0.3),
TargetDriftPreset()
])
report.run(reference_data=df_train, current_data=df_test)
report.save_html('evidently_metrics_report.html')
We set the train and the test set as the reference and current dataset respectively. Notice that we did not choose which statistical test to perform for the columns. Evidently has made the choice for us based on characteristics of the input data. Read more of how they make such decision here.
We can either display the HTML as a Jupyter Notebook cell output nor save it as a HTML file. Here is how the HTML file looks like when we open it in the browser.
The report contains the following:
- A summary of number and proportion of columns where drift is detected.
- Data distribution and drift magnitude for every column
- Correlation between features and target / prediction
The results can also be output as Json or a python dictionary in the following manner:
report.json()
#OR
report.as_dict()
Pre-built Test
We can use the test preset in similar fashion. We set the train and score set as the reference and current dataset respectively.
tests = TestSuite(tests=[
NoTargetPerformanceTestPreset()
])
tests.run(reference_data=df_train.drop(columns='target'), current_data=df_score)
tests.save_html('evidently_tests_report.html')
Note that the score set only has prediction, it does not have ground truth i.e. target column yet, hence the target column in the reference dataset was dropped. Here is how the result look like.
NoTargetPerformanceTestPreset
provides concise summary of the data drift, quality and integrity. The result can also be output as Json or python dictionary in the following manner:
tests.json()
#OR
tests.as_dict()
Conclusion
In conclusion, model drift in machine learning refers to a change in the underlying distribution of data that can result in decreased performance. It is important to track model drift to ensure accuracy of predictions and for compliance and regulatory reasons. There are different types of drift, including concept drift, covariate shift, and prior probability shift. Methods to detect drift include monitoring model performance and descriptive statistics of input features, as well as monitoring changes in the distribution of input features using statistical tests. By using tools like Evidently AI, we can proactively detect and address model drift to ensure the performance and reliability of their machine learning models over time.
Join medium to read more articles like this!
Reference
[1] Contains information from HDB Resale Price accessed on 31st Jan 2023 from Resale Flat Prices-Data.gov.sg which is made available under the terms of the Singapore Open Data Licence version 1.0* Singapore Open Data Licence-Data.gov.sg