7 MLOps ‘Smells’ that show where your ML process lacks

Bad practices and other symptoms in your Machine Learning process that indicates a deeper problem

Vishnu Prathish
Towards Data Science

--

Code smells are a common way of identifying hidden problems in your code using a surface level inspection. Here are some practices that show similar smells in your Machine learning process. They increase the probability of hidden errors in your ML practice over the long term.

You are directly manipulating model artifacts.

After training is complete, data scientists sometimes send models into standard locations such as S3 bucket. This is then picked up by the production inference framework. Or even picked up manually by another team for further processing.

This is a single example of how models artifacts get manipulated by-hand in a data science company. There are several other versions of this — that could be happening to you. Beyond the scalability issues of this practice, it exposes a lack of fundamental MLOps thinking in your Machine Learning process.

Fix this by having a CI/CD pipeline for managing your model artifacts. MLFlow or Sagemaker Pipelines provides a good one.

You have difficulty reproducing data transformations for inference.

Sometimes there is a large number of transformations done on data before training. This could be because of the unreliability of your source data. Hence your data prep (cleaning and anomaly removal) phase is relatively long. Or this could be the result of complex Feature engineering steps that you have to undergo. Either way, since this is done on batch mode for training, they need to be reproduced correctly for inference as well.

But inference in production is a real-time process. Data needs to fit the model exactly how it was intended. If you have difficulty making this work at scale, that would be indicative of a deeper problem. A good engineering pipeline is required to be able to correctly translate all these transformations to real-time.

You use local notebooks.

Local Jupyter notebooks (Server running on your local machine) are a single potential point of data and code loss. They are also often misplaced and not shared correctly between engineers and data scientists. Code written this way is often not version controlled.

Introducing a git repo can contain this problem. But it is still the responsibility of data scientists to version control their code. Using Amazon SageMaker Studio or Jupyterhub and asking the team can help with this issue.

Your model building process lacks standards, patterns, and custom libraries.

Another side effect of having a local Jupyter notebook is that design patterns in them evolve independently. And often in different directions.

This also creates a situation in which lots of code gets duplicated and not unit tested correctly. For example, two data scientists might use two different methods (both being correct) to remove a particular type of anomaly. While it might not be an immediate issue, you are losing an opportunity to standardize by building a central data cleaning repo for your org, which will pay off in the long run.

A standardized notebook repo with functions built as supporting libraries is hence required. This standardization includes both organizational code policies as well as a predefined structure of notebook. The structure can be separated by phases of data science like wrangling, feature building, training, evaluation, etc.

You do not have real-time drift detection mechanisms in place.

Data and concept drift is a common problem. Irrespective of the method used, generically and automatically tracking drift requires a strong understanding of your feature space. When several model types are built, this understanding needs to be generalized and templated. This task largely falls with a data engineer and requires a good pipeline to work correctly.

Not having a drift detection pipeline might not seem like a big deal if you do not have real-time retraining needs. However, having this monitor in place will expose the fundamental flaws in your MLOps process. Processes that ‘just worked’ previously will start to fall apart exposing hidden issues.

Your model versioning, experiment tracking, retraining, and monitoring process is manual.

If models aren’t versioned and previous experiments aren’t saved you will have zero traceability when things go wrong. Model Monitoring also tracks for latency issues, inference errors, and anomalies. Automated model retraining can improve your models detecting drifts. While the benefits of having these features are obvious, not having them (or having a very manual version of it) indicates an immature ML practice.

Thank you for reading

It is important to note that none of this means you have a ‘bad’ ML process. Most data science companies start out by building a model to solve a problem at hand. Mostly in a local Jupyter notebook without having any checks and balances I talked about today. But as you mature and scale, some of these needs to be automated.

Unfortunately, these practices stay and are still very common even with the MLOps industry making big strides over the past couple of years.

If you have made it this far, I have one tip for you — New Data Science practices can use one of the cloud ML suites end-to-end (Sagemaker from Amazon, AzureML, CloudML, Datarobot) to largely avoid everything I just described. However, most cloud providers are still a long way out from fully figuring out ML Pipelines. You will have to mature your practice along with them and worry about vendor lock-in.

Let me know in the comments if you can think of more smells that I have missed.

References

Look for some MLOps best practices here:

On Amazon:

--

--