This series of blog posts started life as my attempt to get a grasp on machine learning operations (MLOps), however this task quickly spiralled into a broad review of machine learning (ML) adoption and tooling in general. As I quickly discovered, MLOps represents the cutting edge of ML in production and so only a relatively small set of leading tech companies (think FAANG) , forward-thinking startups, and increasingly research papers, are specially focussed on this area. This begs the question as to why, in 2022, MLOps adoption, and ML adoption more generally, is in its early stages for most companies? I believe addressing this question is critical to fully understanding the current state of ML tooling both now and trends to expect in the near future.
The key questions I aim to address in this series are,
- What is the level of maturity with regards to ML in industry?
- What is the state of ML tooling and adoption?
- What are likely trends in ML tooling?
This blog post is concerned with the first question.
This series of blog posts is by no means meant to be exhaustive – or necessarily even correct in places! I wrote this to try to organise my thinking on the reading I’ve done in recent weeks and I want this to become a jumping off point for further discussion. This is an absolutely fascinating field and I am really keen to learn more about the industry, so please get in touch!
MLOps
MLOps refers to a collection of practices and tools used to aid the deployment and maintenance of ML models in production. As the name implies, it is a cousin of DevOps, which similarly relates to practices and tools to manage the quality and deployment of software. MLOps has arisen due to increased awareness of both the unique challenges of deploying and maintaining ML models in production, as well as an appreciation that within any deployment process, ML specific elements are only a very small part of the necessary infrastructure (Scully, 2015).
Similar to DevOps, MLOps represents a cultural shift within the industry promoting, among other things, Agile practices and end-to-end ownership of products and services. It is this latter consideration in particular that helps to explain the ubiquity of end-to-end ML platforms, which offer a range of services touching on all the major components of a typical ML workflow. An ML platform’s utility typically follows its ability to abstract away lower-level details (Felipe & Maya, 2021), meaning that these platforms are typically deployed on top of a managed infrastructure layer specifically in order to reduce the operational burden placed on engineering teams (Paleyes, 2021). These platforms aim to reduce the time required to build and deliver models, as well as maintain the stability and reproducibility of predictions. Looking at more established companies in this area, we see a tendency to develop ML platforms in-house (Symeonidis, 2022) in a large part due to the highly context specific nature of pipeline implementations. We will focus on in-house vs third-party platforms and tooling more closely in the next blog post.
Distinct from DevOps, MLOps utilises three main artefacts: data, model, and code (Felipe & Maya, 2021). ML projects have a hard requirement on data, which does not fit well within existing software management practices (Paleyes, 2021), in particular the initial steps of data preparation follow a more waterfall-y approach (Mäkinen, 2021). Additionally, each artefact introduces distinct challenges and have differing development cycles: The data development cycle is typically much faster than the code development cycle, for software engineering coding is the hard part etc. The combination of distinct artefacts and their attendant needs goes some way to explaining the complexity of MLOps and the size of the tooling ecosystem (Symeonidis, 2022). At the process-level, MLOps adds the principle of Continuous Training (CT) into the CI/CD mix.
Also, and this might just be me, but there is some level of confusion around MLOps terminology and agreement on its scope as, especially given the iterative nature of ML, it can be hard to draw lines between what are distinct MLOps concerns vs DataOps etc. Relatedly, it can become difficult when talking of MLOps maturity distinct from ML maturity. For the purpose of this blog, I will use "ML maturity" to relate increasing experience, standardisation, and operation of all elements of the ML workflow, and MLOps maturity relating specific to the operational aspects i.e. once the model is in production.
ML Workflows
Before discussing ML maturity directly, it makes sense to first introduce the concept of an ML workflow, which is one of the defining elements of ML maturity. An ML workflow corresponds to a set of formalised and repeatable processes used to develop and deploy ML models in production. Although the specific steps and orchestration of these steps remains debatable (Brinkmann & Rachakonda, 2021), an oft cited outline workflow is given by (Ashmore, 2019), which highlights the key high-level stages,
- Data management: corresponding to all the steps to get data in a state for model training, data collection preprocessing, augmentation, and analysis.
- Model learning: model training and optimisation happens here.
- Model verification: evaluation against various business metrics and regulatory frameworks.
- Model deployment: including monitoring and updating production models.
The stages as listed are likely broken down into smaller, more defined steps, and as written, do not determine a specific order or sequence (Paleyes, 2021). Note, I use the term "workflow" here interchangeably with "lifecycle", however the latter is occasionally used to specifically highlight everything after model verification (Ashmore, 2019).
Regarding specific implementations of an ML workflow in industry, examples from NVIDIA, Facebook, Spotify, and Google highlight an emerging consensus of a "canonical" ML workflow (Felipe & Maya, 2021). At least at the architectural level, the differences are largely a consequence of the specific use case and other organisational concerns, which may not be representative of the industry at large. However, there is as yet no corresponding "canonical" ML tech stack (Jeffries, 2020) with many of these documented ML workflows being implemented with in-house tooling (Chan, 2021) – spoilers!
ML Maturity Frameworks
There are a number of different frameworks that try to illustrate the different ML/MLOps maturity levels, most notably those by Google (MLOps: Continuous Delivery and Automation Pipelines in Machine Learning, 2020) and Microsoft (Machine Learning Operations Maturity Model – Azure Architecture Center). They are fairly similar, demonstrating how the route to full MLOps adoption requires increasing levels of automation around development, deployment, and monitoring processes. However, as both frameworks focus solely on the operational side of things, neither really helps to clarify when, in a company’s overall adoption of ML, a company should focus on operational concerns specifically. Frameworks that come closer to this are provided by (Algorithmia, 2018) and (Mäkinen, 2021). The Algoritmia white paper provides the most general definition of ML: "We define MLOps maturity as an organisation’s ability to drive significant business value with their ML models." This is measured against six dimensions: organisational alignment, data, training, deployment, management, governance, The paper by (Mäkinen, 2021) is much simpler, stating that increasing ML maturity can be understood as moving through the following stages,
- Data-centric: Focus is on various data management issues
- Model-centric: Focus is on building and productionising first few models
- Pipeline-centric: Has models in production, focus is on operational issues
Where it is only in the "pipeline-centric" stage that MLOps concerns are specifically addressed. Furthermore, the movement between these stages should see corresponding organisational changes (Mäkinen, 2021). This framework does a much better job of documenting ML maturity as a whole, but I would say the categories are not well named. Specifically, the terms "data-centric" and "model-centric" are usually taken to refer to the point of focus of ML workflows as a whole, rather than representative of levels of ML adoption (Strickland, 2022).
Taken together, the latter two frameworks are much more widely applicable and underline the following points,
- ML maturity generally follows increasing progression through and efficiencies around the ML workflow
- MLOps maturity is a continuation of ML maturity, and in many ways is the end goal of ML maturity i.e. is doesn’t make sense to talk about one without the other
- True MLOps maturity can only be achieved once other elements of ML maturity has been addressed
ML Maturity in Industry
What was most striking to me when putting this piece together was the general lack of maturity and awareness of ML outside of the tech leaders; amongst larger companies some 70% have only started AI/ML investment in recent years (dimensional research, 2019), with these companies typically not being AI/ML/Data companies. It’s hard to speculate why exactly this may be the case, however although Big Data, ML, and AI have had a commercial presence for well over a decade now (Hadoop was initially released in 2006 (Wikipedia)), it has only become practical for most companies in the last few years. This is primarily due to the increased availability of affordable cloud-based data warehouses and lakehouses, with adequate tooling (Turck, 2021). This can be understood as part of a longer term trend where, as Matt Turck says "every company is becoming not just a software company, but also a data company." (Turck, 2021)
In terms of general ML adoption, broadly speaking there are two key groups of companies at either extreme of the ML maturity spectrum: those only just making their first steps and leaders at the cutting edge. Linking back to the framework discussed above, we could term these as "data-centric" and "pipeline-centric" respectively. Regardless, in terms of the ML workflow given by (Ashmore, 2019), most companies report that the majority of their project time is spent on the data management stage (Shankar, 2021). A full list of issues and concerns broken down by ML workflow stage is given in (Paleyes, 2021).
Specific to less mature companies in ML, a range of surveys for both relatively large (dimensional research, 2019) and relatively small (dotscience, 2019) organisations, highlight data issues as being the key blocker in data projects, 96% of respondents encountered some data quality issue (dimensional research, 2019). These issues include data availability, data quality and quantity, labelling, and accessibility. Other frequently cited problems highlight general lack of appropriate tooling, expertise, or budgetary constraints. Taken as a whole, 78% of projects stalled before deployment (dimensional research, 2019).
Another interesting finding relates to hyperparameter optimisation, a key step in model training: 24.6% of respondents neglected this entirely and 59.2% performed it manually, with few respondents reporting the using third-party tools (dotscience, 2019). This may again be a consequence of the immaturity of these organisations in this field, where it is more efficient to manually select from the configuration space especially for a early ML project. Relatedly, it is often cited that computing power is the bottleneck for hyperparameter optimisation rather than the tooling or setup (Huyen, 2020).
Wrap-up
This post discussed the current state of ML maturity in industry and MLOps in general. The main takeaways were the general level of immaturity in terms of ML adoption when considering industry applications as a whole, as well as the level of technological, process, and cultural challenges that full MLOps adoption poses. These two elements are highly relevant for understanding the current state of ML tooling, which is highly diverse and emblematic of an industry yet to find consensus.
References
Algorithmia. (2018). Machine learning in production: a roadmap for success.
Ashmore, R. et al. (2019). Assuring the machine learning lifecycle: Desiderata, methods, and challenges. ACM Computing Surveys, 54(5), 1–30.
Brinkmann, D., & Rachakonda, V. (2021, April 6). MLOps Investments // Sarah Catanzaro // Coffee Session #33. YouTube. Retrieved May 2, 2022, from https://www.youtube.com/watch?v=twvHm8Fa5jk
Chan, E. (2021, May 12). Lessons on ML Platforms – from Netflix, DoorDash, Spotify, and more. Towards Data Science. Retrieved April 28, 2022, from https://towardsdatascience.com/lessons-on-ml-platforms-from-netflix-doordash-spotify-and-more-f455400115c7
dimensional research. (2019). Artificial Intelligence and Machine Learning Projects Are Obstructed by Data Issues.
dotscience. (2019). The State of Development and Operations of AI Applications.
Felipe, A., & Maya, V. (2021). The State of MLOps.
Huyen, C. (2020, June 22). What I learned from looking at 200 Machine Learning tools. https://huyenchip.com/2020/06/22/mlops.html
Jeffries, D. (2020, October 13). Rise of the Canonical Stack in Machine Learning. Towards Data Science. Retrieved May 2, 2022, from https://towardsdatascience.com/rise-of-the-canonical-stack-in-machine-learning-724e7d2faa75
Machine Learning operations maturity model – Azure Architecture Center. Microsoft Docs. Retrieved May 2, 2022, from https://docs.microsoft.com/en-us/azure/architecture/example-scenario/mlops/mlops-maturity-model
Mäkinen, S. et al. (2021). Who Needs Mlops: What Data Scientists Seek to Accomplish and How Can MLOps Help? https://arxiv.org/abs/2103.08942
MLOps: Continuous delivery and automation pipelines in machine learning. (2020, January 7). Google Cloud. Retrieved May 2, 2022, from https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
Paleyes, A. et al. (2021). Challenges in Deploying Machine Learning: a Survey of Case Studies.
Scully, D. et al. (2015). Hidden Technical Debt in Machine Learning Systems.
Shankar, S. (2021, December 13). The Modern ML Monitoring Mess: Categorizing Post-Deployment Issues (2/4). Shreya Shankar. Retrieved April 30, 2022, from https://www.shreya-shankar.com/rethinking-ml-monitoring-2/
Strickland, E. (2022, February 9). Andrew Ng: Unbiggen AI. https://spectrum.ieee.org/andrew-ng-data-centric-ai
Symeonidis, G. et al. (2022). MLOps – Definitions, Tools and Challenges.
Turck, M. (2021, 9 28). Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape. https://mattturck.com/data2021/
Wikipedia. Apache Hadoop. Wikipedia. Retrieved May 2, 2022, from https://en.wikipedia.org/wiki/Apache_Hadoop