
As a person who is involved in mostly the data related activities such as data processing, data manipulation and model predictions, you are also given an additional task as a data scientist or a machine learning engineer to deploy the product in real-time. After doing the heavy lifting of understanding the right parameters for various models and finally coming up with the best model, deploying the model in real-time can have a significant impact in the way it impresses the business and creates monetary impact.
Finally, the model is deployed, and it is able to predict and give its decision based on the historical data at which it was trained. At this point, most people consider that they have completed a large portion of the machine learning tasks. While it is true that a good amount of work has been done so that the models are productionized, there is additional step that is often overlooked in the machine learning lifecycle that is to monitor the models and check if they are performing on the future data or the data that the models have not seen before.

Though a good amount of time was spent on training the ML models and considering the best evaluation metric for our task, there can always be situations where the test data distribution might be completely different to the training data distribution. This can sound a bit complicated but let me take the time to simplify it further. Just to give you a context, consider that you have trained a robust recommender system in the year 2005 to recommend books to various users. On those days, Harry Potter series used to be quite popular, and they would be recommended mostly. If a user actually looks for books that are mostly related to child fiction, they are likely to be recommended Harry Potter. At the present age, however, it wouldn’t make sense if they are only recommended Harry Potter series where there are ton of other works by various authors which became best sellers. This is not to discredit the work by JK Rowling (Harry Potter series author) but to just showcase how recommendation system models which are trained on a completely different data during training time might not always be performing the best on the test data that is available during the production phase. Therefore, there can be issues along the way when it comes to training the model on a particular set of data and expecting it to perform well on the data that has distribution that is completely different from the trained data. Let us now go over potential problems that we might face if we do not monitor our ML and Deep Learning models after production.
Data Drift

We have tested a large number of models to get the best based on the evaluation metric such as the mean squared error or many others depending on the business requirements. Furthermore, we also took a further step to divide the data into the training and test set in order to monitor how the model can be behaving in real-time as well. One of the fundamental assumptions we make when using the test set is that the distribution is quite similar to how we would find data in real-time. If the data in production time reflects the test data, the model can also be performing similar to how it did on the test data. Nonetheless, there can always be situations where the data that is available during production time is quite differently distributed than the test data. This phenomenon is also known as data drift and can cause companies to lose a good profit if left unchecked. Therefore, it can be crucial to constantly monitor the models after production to see if the behavior is similar to what was actually expected during the testing phase.
Concept Drift

When considering a supervised Machine Learning problem which contains input data ‘X’ and an underlying target variable ‘y’, there are many ML models that try to understand the relationship between X and y. In other words, they try to map the input ‘X’ with the target variable ‘y’ based on their set of different parameters depending on the algorithm that they use under the hood. Due to situations outside our control, however, there can also be situations where the relationship between the input and the output changes over time. This can cause the ML model to be performing poorly when it is deployed in real-time. This is also known as concept drift where the relationship between the input and the output is changed over the course of a few days or months. Hence, it can be handy to constantly monitor the data that is presently being streamed to the ML model in production to ensure that it is performing the best without this phenomenon of concept drift.
Deprecation of Libraries

During the phase of model training and feature engineering, there are various libraries that were actually used in order to get the best predictions. In order to perform these tasks, various libraries were used that usually make the task of training the models easier to follow and simple to use. As the days pass, however, these libraries sometimes get deprecated and some of their features change as well. In this case, the libraries that were performing optimally when we have worked with the data and performed predictions no longer are necessarily performing well at present. It is during these times that constant model monitoring can be handy where the latest libraries and environments can be used to further improve the model and its predictions.
Pipeline Issues

In order to perform a constant stream of operations, pipelines are usually taken into consideration where a large set of operations are performed with ease with this method. Operations such as feature standardization and dimensionality reduction are done in pipelines so that it becomes easier to operate and they are deployment friendly. However, there can be issues in the pipelines along the way when we fail to constantly monitor the models. These pipelines can have issues when we are trying to constantly load the data and get predictions from our models. When we monitor our models, we have a higher chance of detecting these pipeline issues and ensuring that the predictions are generated in real-time without a delay in their operations.
Conclusion
After performing the task of model deployment in real-time, it is now time to constantly monitor the performance of the models as well. There can be issues if monitoring is not done at regular intervals such as data drift, concept drift, pipeline issues along with deprecation of various libraries. There can also be other issues as well when we do not monitor our models. Thank you for taking the time to read this article. Feel free to let me know your thoughts and comments.
Below are the ways where you could contact me or take a look at my work. Thanks.
GitHub: suhasmaddali (Suhas Maddali ) (github.com)
LinkedIn: (1) Suhas Maddali, Northeastern University, Data Science | LinkedIn
Medium: Suhas Maddali – Medium