Over the last few years, I have watched hundreds of students and engineers building machine learning models. I have had many opportunities to be part of the jury in project competitions of Engineering Colleges. Similarly, I have served as a judge in a number of hackathons where I saw the contestants building models and systems in 36 or 48 hours, working day and night. As part of my responsibilities, I have reviewed the work of many interns and also interviewed several candidates for ML engineering positions.
They all aim to solve challenging and groundbreaking problems. The want to detect cancer, predict frauds and estimate project delays. What surprises me is that the accuracy is always above 90%, sometimes even more than 95%.

Where then, I ask myself, is the problem? On one side, there are so many accurate models to solve such serious and important problems; on the other side, hardly any such systems in actual use. We don’t see such a gap in classical software. Before entering AI/ML, I wrote conventional software. As co-founder of a health informatics company, I built software for hospitals and clinicians. In that world, the software that solves an important problem and works properly always finds traction. In other words, if the developer of a much needed system is happy with its accuracy, the chances that the system will get used in the real world are very high.
I know that there are many differences in the way conventional software and AI/ML systems are developed and used. But none of those explains this weird gap between systems with very high accuracy and virtually no deployments.
From what I have seen, the reason for this phenomena is majorly psychological. It is an effect that resides in the mind of the software developer. But it must be taken seriously because it is affecting the progress of the entire AI/ML paradigm.
It’s about closure. When does the developer of a program get closure? A program is a developer’s creation. Like all creations, every program that you have written forever remains with you. But you get some kind of closure when someone else accepts it or validates it.
In conventional software, this happens in two steps. First, there are testers who test the programs. Once that is done, the developer finds some kind of relief. The baby is out in the real world now. But this does not always happen. When I wrote my first software for doctors, I was alone, there was no tester. So I went directly to step two, which is when the users use the program.
Remember that I am not talking about any process of bug fixing etc. This is a pure psychological phenomena of your creation being used by someone. Doubtless, bugs will be reported and you will fix them over time, sometimes years. But closure is found when someone else tests or uses the program, not after fixing every bug ever reported.
In Machine Learning, I have observed a novel pattern of closure. Like I said, I have watched a lot of ML systems being built. Building a machine learning model itself involves two steps – training and testing. The word ‘testing’ here is a technical term and is a part of the model development process. It is not something that is done by a tester or a user.
The novelty that I want to describe begins with what this process produces. Apart from the model itself, most model training algorithms produce a number called ‘accuracy’ at the end. Sometimes, it’s straightforward – like 90% accuracy. In some cases, the measures are more complicated. We will ignore the actual measure for the present and will just note that the process produces its own measure of accuracy. For example, if you are using the basic scikit-learn for training a logistic regression model or using TensorFlow to build a CNN based image classifier, you will get this magic number.
This number has an equally magical effect on the developer’s mind. The creator’s closure becomes instantaneously and automatically available. No need for a tester or the interminable wait for the program to be actually used- the developer finds validation right at the end of the model creation process.
The accuracy at which developers find satisfaction changes as per the position and experience of the developer. While a student may get satisfied with the very first number, a data scientist may not be happy with the starting accuracy and might want to improve it further. I am not referring to this dissatisfaction. The point is that whatever may be accuracy the developer becomes happy with, it’s still a number that the development process produces itself. It does not need anything or anyone outside.
I am beginning to think that the novel pattern of closure in ML has a lot to do with the deployment gap. The accuracy is a deceptive measure of the usefulness of a ML system. The accuracy may well be 95%, but the real important issue is the cost of the 5% error. For example, what is the cost of not detecting 5% cancers? And even more importantly, what can we do, going beyond ML that can prevent this error, or at least reduce the cost considerably?
Somehow in ML, the developers are psychologically separated from this stage in the life of their program. Their closure comes much earlier, at the end of the model training. I call this the ‘Open Loop of ML’. Imagine that you are the user of a ML system. The maker of the system has given you something with 98% accuracy, but you can’t quite figure out how to use it.
There are two reasons why you find it hard to use this system. One is the cost of the error. I have written an entire article on the consequences of error . We won’t go into those details now, but it’s easy to see that the costs of even a few errors can be significant. But the major reason for your inability to use it is quite different. You do not know where exactly this 2% error is. In the thousands of images that you submitted, which ones are those that the model couldn’t quite ‘get’? Of course there are other measures such as confidence, but my suspicion is that answering this question will require much more than another magic number.
More than any numbers, we will need a different mindset. The mindset to close the loop, which might require looking beyond ML, but still extract all possible help from it.
In this article, I have tried to define the problem that I have seen up close. Few solutions have occurred to me, some of which I have tried in practice. But that will have to wait for another article.