Why your Models need Maintenance

Martin Liebig (Schmitz), PhD
Towards Data Science
3 min readMay 12, 2017

--

Photo by Ant Rozetsky, CC0

People often think a given model can just be put into deployment forever. In fact, the opposite is true. You need to maintain your models like you maintain a machine. Machine Learning models can get off or broken overtime. This sounds odd to you because they have no moving pieces? Well, you might want to have a close look on change and drifts of concept.

Change of Concept

Let’s start off with an example. If you try to build a predictive maintenance model for an air plane, you often create columns like

Error5_occured_last_5_mins

as an input for your model. But what happens if error number 5 is not error number 5 anymore? Software updates can drastically change the data you have. They fix known issues but also encode your data in a different way. If you take the post-update data as an input for your pre-update model — it will do something, but not what you expected. This phenomenon is called change of concept.

Drift of Concept

A very similar phenomenon is drift of concept. This happens if change is not drastic but emerging slowly. An industrial example is encrustment of a sensor. This happens over time and a measured 100 degrees are not 100 degrees anymore. An example in customer analytics are adoption processes of new technology. People did not use iPhones at once, but slowly adopted to it. A column like “HasAnIphone” would mean a very tech-savvy person — in 2007. Today this indicates an average person.

What Can I Do?

An example for window based relearning. The pattern to detect circles moves over time. Only recent data points are included to built a model.

A common approach to overcome concept drifting is window based relearning. Imagine that your model is is build on last year’s data. This window is moving over time and you can thus catch drifts of concepts. While this sounds very nice, it runs into practical problems. One problem is limitation of training data. The smaller the window size, the smaller the sample size.

This approach helps a lot by reducing issues with drifts of concepts. But what about change of concepts?

Handling Concept Changes

Changes that are introduced by humans and which could impact the model, need to be reported and the model needs to be adapted. The hard part is, that the person in charge of the changes needs to be aware, that a model is affected by the changes. A ML-aware company culture is key to success. The impossible part is, that you sometimes cannot build a model on “new” hardware. You simply do not have data for a new model and might need to wait for a while until you have it.

Graph resulting from backtesting. Real perforamnce (crosses) is compared to cross validation performances. It is decresing over time and results in a notification

Another idea is back testing. In most cases, you know what happened in the aftermath. You can thus compare your model prediction to reality. On the other hand, your validation is giving you an estimate for the true performance of your method. You can use these two averages and make daily comparisons. If the underlying concept switches, you should see a decrease of performance. Dashboarding and notification approaches are a good way to put this into action.

Key takeaway: Machine Learning competence is critical

This article demonstrates why the common practice of “Model and Run” is a bad practice. Companies need to have a careful eye on maintaining their machine learning competence. Besides maintaining the involved persons, it is also key to have one common platform and well documented code.

--

--