The world’s leading publication for data science, AI, and ML professionals.

6 Research Papers about Machine Learning Deployment Phase

Adopting The Academic Mindset and Habits

Photo by Annie Spratt on Unsplash
Photo by Annie Spratt on Unsplash

A beginner’s mistake is to ignore research. Reading research is daunting, especially when you’re not from an academic background, like me. Nonetheless, it ought to be done.

Ignoring research can easily lead to you falling behind with your skills set because research paints the scope of the current problems being grappled with. Therefore, to remain relevant as a machine learning practitioner involves adopting the academic mindset and habits [to some degree].


For my studies, I’ve curated 6 research papers I will be reading to learn more about machine learning deployments going forward. Here are the research papers in non-chronological order:

1.

Challenges in Deploying Machine Learning: A Survey of Case Studies, Paleyes et al, Jan 2021

Machine learning practitioners and researchers face a number of challenges during the deployment of machine learning models in production systems.

"This survey reviews published reports of deploying machine learning solutions in a variety of use cases, industries, and applications and extracts practical considerations corresponding to stages of the machine learning deployment workflow. Our survey shows that practitioners face challenges at each stage of the deployment. The goal of this paper is to lay out a research agenda to explore approaches addressing these challenges."

2.

Hidden Technical Debt In Machine Learning Systems, Sculley et al, Dec 2015

This is a popular paper that attempts to document the realities of machine learning in the real world from a costs perspective. The paper states "Machine learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. This paper argues it is dangerous to think of these quick wins as coming for free."

Essentially, the goal of this paper is to explore different ML specific risks involved with implementing machine learning in the real world.

3.

A Systems Perspective To Reproducibility in Production Machine Learning Domain, Ghanta et al, Jun 2018

The part of machine learning that’s not always bragged about is the logistics, yet its importance is vast. In order to reproduce machine learning pipelines that have been deployed in production, machine learning practitioners must capture both the historic state of the model, as well as its current state. This is an extremely complex task, but this paper allegedly has some solutions.

"We present a system that addresses these issues from a systems perspective, enabling ML experts to track and reproduce ML models and pipelines in production. This enables quick diagnosis of issues that occur in production."

4.

Software Engineering for Machine Learning: A Case Study, Amershi et al, May 2019

Unlike many companies, Microsoft has been implementing machine learning for many years. From their wealth of experience, Microsoft seeks to share what they believe should serve as a set of best practices to other organizations developing AI applications and Data Science tools.

"We have identified three aspects of the AI domain that make it fundamentally different from prior software application domains:

  1. Discovering, managing, and versioning the data needed for machine learning applications is much more complex and difficult than other types of software engineering
  2. Model customization and model reuse require very different skills than are typically found in software teams
  3. AI components are more difficult to handle as distinct modules than traditional software components – models may be "entangled" in complex ways and experience non-monotonic error behavior."

5.

The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction, Breck et al, Dec 2017

Of all the papers on my list, I am least familiar with this paper (meaning I’ve only come across it recently). In the abstract, the authors state that they will "present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow roadmap to improve production readiness and pay down ML technical debt.".

6.

Building a Reproducible Machine Learning Pipeline, Sugimura. P & Hartl. F, Oct 2018

All machine learning practitioners (i.e. industry or academia) are required to build reproducible models. Failing to do so can result in significant financial loss, lost time, and loss of personal reputation if there is absolutely no way to recover past experiments. This paper covers various challenges to reproducibility, practitioners may face throughout the lifecycle of a machine learning workflow. The paper then goes on to describe a suitable framework, created by the authors, to overcome the aforementioned challenges.

"The framework is comprised of four main components (data, feature, scoring, and evaluation layers), which are themselves comprised of well-defined transformations. This enables us to not only exactly replicate a model, but also to reuse the transformations across different models. As a result, the platform has dramatically increased the speed of both offline and online experimentation while also ensuring model reproducibility."


This list is by no means extensive. Andrew Ng suggests practitioners should read [and understand] 50–100 papers on a subject to have a very deep understanding of the requirements of the domain.

Understanding research papers does not only come from reading lots of research. You may be required to deviate between trusted resources online such as blog posts and video content. Consequently, I’ve added some valuable resources to make understanding Machine Learning deployments easier.

Wrap Up

Many practitioners fall into the trap of thinking that they aren’t required to read research papers – this often occurs in practitioners that aren’t as academic (like me). Deciding to ignore research could easily lead to you falling behind in the field hence it’s important to adopt an academic mind and habits, whilst still applying yourself practically.

Thanks for Reading!

If you enjoyed this article, connect with me by subscribing ** to my FRE**E weekly newsletter. Never miss a post I make about Artificial Intelligence, Data Science, and Freelancing.

Related Articles

Building Reproducible Machine Learning Pipelines

Machine Learning Model Deployment

4 Machine Learning System Architectures

The Machine Learning WorkFlow


Related Articles