Why is re-training ML Models Important?

A Product Manager’s Perspective

Published in

Towards Data Science

6 min readMar 3, 2021

As a product manager, you are responsible for measuring the continuous success of your product. That might include validation before launching, measuring uplifts in an A/B test while launching, and keeping track of core KPIs. If you are managing a Machine Learning product, the long-term success of your product will depend on keeping your models up-to-date. In this post I explain why this is important problem and how can you ensure that continuous success through model re-training.

If you don’t train your ML model, it can deteriorate to the point you don’t know anymore what is going on! (Untitled collage by the Author, 2020)

Why you should re-train your ML Model

ML models rely on data to “understand” a particular problem and generate the desired output. In most cases, the data your model depends on will change both gradually; for example, because user preferences change, and also dramatically due to the nature of the product; for example, due to a sales event like Black Friday, or COVID-related travel restrictions. Data can also change due to unforeseen changes in the system, such as a change in the way currency is changed from dollars to cents without prior notice.

Throughout this post, I will use two examples that you might be familiar with. A product that recommends songs, and a product that detects a plant species from a photo. I will explain how different these two examples are for the case of model re-training.

You don’t know what you don’t know

And you should know how your model is performing, so before even investing in model retraining, make sure your ML System (not only the model) has monitoring that enables you to see if the performance is degrading over time. Without that key piece of information, you simply won’t know how big of a problem you might have — and that should be VERY scary! 👻

Because most models rely on the premise that the distribution of the data that they saw during training is the same as the data they will see in the future, monitoring feature drift is a great indicator for when to re-train.
Monitoring prediction drift or prediction accuracy are probably some of the best indicators of re-training needs because the measure how the output of your model changes and performs over time.
The ML model does not live in isolation, so measuring the output of the model from a user experience point of view is very important — If you want to learn more about this topic, I highly recommend this very interesting talk by Lina Weichbrot “Measuring operational quality of recommendations” from the ACM RecSys’18 Conference.

Deciding how often to retrain your ML Model

Some models can handle unseen values and make good predictions, while some models have big trouble with this. If your data will be changing very fast, you might want to build a model that deals well with unseen values and re-train more frequently than if the data changes slowly. This is particularly important for the case of recommendations systems.

An App for detecting the plant species from a photo might use colors and shape detectors as features to distinguish plants. Because ~2000 new plant species are discovered and named every year you should probably ensure your system recognizes the latest species of plants by re-training the ML model with images for those plants and update your model as new species are introduced to the market. However, if your product is designed for researchers, it should probably always be up to date with the latest discoveries to support their research.

For systems where there is human interaction, a very interesting problem is a behavioral change, and understanding how your model deals with that. We call that “drift”. As mentioned before, behavior changes can be slow, steady, and predictable, or sharp and unexpected. For example, music taste changes as people grow, new genres appear and popularize every year, but now and then a song becomes viral and everyone is hearing it, or it is Christmas time again and people listen to Christmas Carols in some parts of the world; both scenarios should be accounted for when recommending music.

If a model is very easy to update, even with fully updating its parameters, and the costs associated with re-training the model are low; you might decide to just schedule re-training regularly and forget about the rest of this post. However, when the problem is a high-stakes one, or the costs of re-training a model is very high (GPT-3 could cost 4.6M$ to re-train), optimizing how often you should retrain can either make you a lot of money or save you a lot of money!

The same way you might want to do a parameter grid search to find the optimal parameter set for your machine learning model, you should do a time-aware evaluation of the model to define the retrain frequency parameter in an offline evaluation setting (I talk about offline evaluation in a previous Medium post)

How many data samples does the model need to reach its peak performance? It is known that the return on investment of having more training samples peaks at some point for most algorithms. Others might be data-hungry.
What do you do with the old data? You can either ignore it or have the algorithm give it “less importance”
Of course, what happens as the model has not been re-trained for N amount of days.

How to keep your model up-to-date?

Based on your needs, there are three main ways in which you can keep your model up-to-date.

Light-updates, where the model is re-trained seeing new data, and perhaps forgetting some old data.
Heavy-updates are needed when the problem has drifted quite a lot, so the model parameters need to be re-calculated again.
Sometimes the data changed quite dramatically, so it is time to completely re-think your model, or even create a new one; think COVID-19 type of change.

Avoiding human errors by automating it

Of course, having someone looking at a monitor and manually deciding when to re-train a model can be very costly, and also prone to human error. You should invest in an automation pipeline for your model that re-trains it either based on an event (deterioration of a metric) or periodically. I will discuss this topic with more detail in a future post.

When re-training is not enough

We would wish that retraining a model would solve all the problems related to loss in performance of your model. However, that might not be the case, sometimes things go wrong. A lot of the time things go wrong, and false alarms can be raised. For example, the distribution of data might change because tracking is broken. New feature values might appear because another team changed the price format and forgot to tell you. In those cases, re-training might not the right approach to solve the problem but having proper monitoring will help you uncover them very fast.

To summarise, in this blog I highlighted some of the questions that you should answer before deciding how to keep your model performing when the world around it is changing, what metrics you should monitor that will either automate or flag a need for model re-training, and what can you do to keep up with the change.

It is for you to decide how should the ML model adapt to changes taking into consideration the (1) technical complexity of doing so, (2) the costs associated with keeping up to date, and, of course, (3) the return on investment. I highly recommend investing in monitoring as a starting point, to understand not if but how big is your problem, and then follow along. Oh, and rather be safe than sorry