Bridging AI’s Proof-of-Concept to Production Gap

We often hear about exciting breakthroughs and new state-of-the-art research in Artificial Intelligence (AI) and machine learning (ML).
Despite these advances, the harsh reality ** is that most of these proofs-of-concept (PoC) projects are not deployed in the real world. Why is that so**?
This problem is known as the PoC-to-Production gap, where ML projects encounter significant bottlenecks and challenges on their route towards practical deployment.
In this article, we discover three key lessons shared by Andrew Ng on how teams can bridge this gap to deliver greater actual value from ML projects.
About the Speaker

Andrew NG is the founder of deeplearning.ai and co-founder of Coursera.
He is currently an Adjunct Professor at Stanford University. He was the Chief Scientist at Baidu Inc. and Founder of the Google Brain Project.
He is currently also the founder and CEO of Landing AI.
(1) Small Data Problems
The talk of ML applications in the industry has mainly centered around consumer internet use cases such as social media (e.g., Facebook), entertainment (e.g., YouTube), and e-commerce (e.g., Amazon).
This phenomenon is not surprising given that these companies have access to massive amounts of data from millions of users.
However, there are many other applications and industries beyond consumer internet that only have access to small data.
This small data is a big challenge in the deployment of AI projects since the problem of skewed data distribution is drastically amplified. While we can build ML models that deliver good accuracy on average, the models may still do poorly on rare occurrences less represented in the data.
An example is the development of a chest X-ray diagnostic ML algorithm that predicts well on common conditions like effusion but predicts poorly on rare diseases like hernia.
While the overall model may have excellent performance metrics, deploying a system that misses obvious hernia cases would be medically unacceptable.
Fortunately, there is progress in researching better algorithms to handle small data. Here are some examples:
- Synthetic data generation (e.g., GANs)
- One/Few-shot Learning (e.g., GPT-3)
- Self-supervised Learning
- Transfer Learning
- Anomaly Detection

Even if you only had small data, it is almost always a terrible idea to wait for more data (e.g., first build new IT infrastructure) before developing ML solutions in an enterprise.
It would be more prudent to start with whatever data you have and work backward to determine what additional data you will need to collect. It is only by starting with a system that you can further figure out how to build or enhance the IT infrastructure.
(2) Generalizability and Robustness
It turns out that plenty of models that work well in published papers often do not work in production settings.
This problem stems from building poorly generalizable and non-robust models, and it proves to be yet another challenge in deploying ML proofs-of-concept.
Going back to the earlier example of X-ray diagnostics, a team may build a performant algorithm based on high-quality X-ray images obtained from modern machines operated by well-trained technicians at a reputable institution like Stanford Hospital.
If we were to run this model in an older hospital where X-ray machines are older, technicians are less well-trained, and imaging protocols are different, we would expect the model’s performance to degrade.
This performance contrasts with a human radiologist, who would likely diagnose the scans similarly well across both hospitals.
This challenge applies to all industries, and not just healthcare. It highlights the importance of deeply understanding the training data distribution and having the domain knowledge to know the nuances of the data and processes.
Given that there are no clear-cut solutions to this challenge, researchers and engineers must work together to develop systematic tools and processes to ensure that the algorithms are generalizable to data beyond the ones used for training.

(3) Change Management
Many teams fail to realize that AI solutions can disrupt the work of many stakeholders. Without the correct buy-in and acceptance, business users will hesitate to use the ML models you have painstakingly built.
The following figure illustrates the five key considerations in managing the change that technology brings:

In terms of providing reassurance and explaining the model’s predictions, teams can leverage tools for explainable AI (specific to each stakeholder’s needs), as well as third-party auditing services to review the ML system’s code and setup.
Once the stakeholders understand how ML algorithms derive their predictions and are reassured that it is done fairly and reasonably, it becomes more likely that the models will be deployed and used on the ground.
Of course, ML solutions by themselves are utterly useless. The first (and perhaps most important) step of any project is scoping, where technical and business teams come together to identify use cases at the intersection of ‘What AI can do‘ and ‘What is valuable for the business‘.
Conclusion
According to McKinsey, AI could generate a $13 trillion uptick in the world economy by 2030. More importantly, most of this value lies in industries outside the consumer internet industry, which is a testament to the enormous amount of untapped value out there.
To realize the full potential of AI across all industries, there is a need to turn ML into a systematic engineering discipline where projects are deployed effectively to deliver significant value.
Feel free to check out the full seminar below:
Before You Go
I welcome you to join me on a Data Science learning journey! Follow my Medium page and GitHub to stay in the loop of more exciting data science content. Meanwhile, have fun bringing your projects to practical deployment!
Key Learning Points from MLOps Specialization – Course 1
End-to-End AutoML Pipeline with H2O AutoML, MLflow, FastAPI, and Streamlit