How To Acquire Clients For AI/ML Projects By Using Probability

Himanshu Chandra
DataDrivenInvestor
Published in
6 min readJul 30, 2020

--

Beating competing bids via binomial theorem & statistics

Photo by Icons8 Team on Unsplash

Quite often, we machine learning practitioners get swept away in the rush of applying different models to solve a certain problem. Extensive use of statistical tests takes either a back seat or comes into play only when presenting end results to the client.

While certain metrics like ROC, PR curves & MCC help you fine tune your results at the fag end of your project (read: How NOT to use ROC, Precision-Recall curves & MCC), certain other metrics might help you win new projects for your organization if used at the start itself.

Today, I present a case study of how we typically bag new projects and how metrics like p-values, confidence intervals and concepts like binomial series help us gain customers’ trust.

The Case In Point

This particular project was one of computer vision for the manufacturing industry. The client produces medium-sized metallic parts which are visually inspected for cracks, dents, rust and few other categories of defects, at the end of the production line. An image based automated solution was required to replace the current manual process — since their daily production volume was on the rise — currently at around 1,500 parts per day.

Bids from multiple software vendors was invited, and we were one of them. As often happens, the client had a great record and expertise in the manufacturing domain, but not quite so much in the AI/ML field. The typical approach they planned to take was to check past projects, company size, commercial quotation and try to quantify that into a general sense of confidence for a particular service provider.

The Gamble

We were not particularly happy about this approach, being a growing, but a small company still. We were however, confident of our skills and hence I proposed this to the client:

Why don’t we test our initial models, one month from now, on 1,000 parts and amongst all the vendors, see whose model performs the best?

This would become a PoC (Proof of Concept) and also help us estimate accurately for the longer engagement. By the time I proposed this, they had already narrowed the list down to us and another competitor. They were, however onboard with the PoC idea, since it seemed more quantifiable than their current methods and asked us and the competitor to get started on the initial classification model.

The Catch

How do you confidently say someting is better than the other based on just one classification run over 1,000 samples? After all, this model would be run on 350,000+ parts per year. We were questioned:

Is such a ‘sample run’ reliable enough for comparison?

Also, when we say ‘better’, what metric are we considering here? Is it accuracy, precision, recall, specificity, AUC or something else?

The second question is easier to answer, so let’s start with that. Our client was very clear in the fact that they track precision. So nothing we could discuss further there. For a quick review of the confusion matrix and the associated metrics, read -

To anwer the first question, we first set an expectation, a confidence level — 95% in this case. We told them that whatever the result of comparing the two models may be, we would be 95% confident that it is not erroneous. We could easily have chosen 90% or 99% or any other number, but they were fine with 95% at the PoC stage.

Now to compare classification models, there are a few statistical methods; McNemar’s paired test is one which is commonly used. It could work here, but we wanted something more intuitive which could be trusted by the client too, when explained in day-to-day terms.

Binomial Series To The Rescue

Imagine this: Suppose you flipped a coin and got a head. You flipped again and got a head again. Then a tail. Then a tail...

After 20 flips, you saw that you had 7 heads and 13 tails. If I asked you to be 95% confident in your statement, would you say that the coin is a fair one, or is it biased towards tails? After all, one knows from experience that more often than not, you do not get an exact 50–50 split of heads-tails in 20 flips. That does not mean the coin is biased.

Alternatively put, how many tails would you want out of 20 flips, before saying that the coin seems to be favouring tails?

Or that the probability of tail performing better than head is significantly higher?

The answer to the above question is this:

Tails required to prove a tail-biased coin
Tails required to prove a tail-biased coin (Image copyright : Author)

The above is derived by applying the Binomial Formula. You can also find this out quickly using python:

from scipy.stats import binomN, p, alpha = 20, 0.5, 0.05distribution = binom(N, p)heads = distribution.ppf(alpha)if distribution.cdf(heads)>alpha:   heads = heads -1print(f"At {100*(1-alpha)}% confidence level, minimum tails required out of 20 flips for coin to be tail-biased: {format(N-heads,'.0f')}")

The output for N=20 (twenty flips or trials) is:

At 95.0% confidence level, minimum tails required out of 20 flips for coin to be tail-biased: 15

This is pretty much the same question we want answered for our PoC, where ‘tail’ is our model and ‘head’ is the competitor’s model. If we start out by assuming that both models are basicially similar, none better than the other, then both models should have a 50% probability of scoring higher precision after testing on multiple datasets.

But if we test it on 20 datasets, and see that model1 has a higher precision on 15 occasions, then we can conclude with 95% confidence that model1 is better than model2, precision-wise.

So that’s what we decided to ask for. We requested the client to split the 1,000 unique samples/parts into 20 groups of 50 parts each such that each group has a fair & similar representation of the defects found in production and we ran the two models on these 20 groups, noting down how many times our model ‘won’.

Our model showed a higher precision 16 out of 20 times. Looking back, we could even have gone for 99% confidence, and won! But then, hindsight’s always 20/20.

You can change alpha to 0.01 in the above code to see the output as 16.

Learnings

It was much more reliable for the client to see that we won 16 out of 20 times with smaller datasets, instead of just witnessing a single win with a bigger sample size.

Knowing that we could deliver a better model was not enough, showcasing a rough version to the client as quickly as possible in the most intuitive way, was.

Interested in knowing how similar metrics help us gain customer confidence and ensure we serve their business needs better? Read:

References:

Interested in sharing ideas, asking questions or simply discussing thoughts? Connect with me through my website, I am Just a Student or on LinkedIn, YouTube or GitHub.

See you around & happy learning!

Gain Access to Expert View — Subscribe to DDI Intel

--

--