Getting the green light on project implementation

Robert de Graaf
Towards Data Science
4 min readJan 30, 2018

--

In my previous blog post, I discussed how to better sell data science projects. That post, while applicable to the whole process, could be seen as especially useful in the initial phases, especially for requirements gathering. In this post, I want to tip the balance towards the next stage of the process, where you present a project to be implemented.

I want to explain that my motivation here is that where the science of statistics meets the art of sales doesn’t get talked about or written about, despite being recognised at least some of the time as intrinsic to data science.At the very least, if it does, I’ve had a hard time tracking it down. Unless I’m wrong, there isn’t a MOOC about it at the moment, despite arguably an over supply of Data Science MOOCS, and if it gets blogged about, it gets blogged about seldom. I’m trying to fill the gap — after all, there’s almost no point writing another blog post about how one of the core machine learning algorithms works when Hastie et al give away the ultimate guide to machine learning algorithms for free. My biggest hope in this vein, though, is that someone writes a comment on one of my posts that leads me to the Tukey, Breiman or Cleveland of statistics for commerce, and I will be able to hang up my keyboard.

This post will focus on what kind of model to present to a customer at a pre-implementation meeting where you’re looking for the customer to say yes to implementation and therefore commit more time and money. This stage could turn out to be the most difficult hurdle to get over, because this is where the time and money commitment increases from a small amount to potentially a much larger commitment. In an internal sales situation, this could mean the executive is deciding whether to move the project from within a small data science to a larger team who will need to implement the project and allow it to be used widely through the organisation.

Obviously in this scenario, it is vital to be able to link implementation of your model to solving a real world problem your customer is experiencing, and that the real world problem should be important to the customer. Where possible, you should be able to link solving the particular problem with saving a certain amount of money. You will be more likely to be able to find a reasonable dollar value when you are presenting a model internally. At the same time, if you have been successful with your earlier discovery meetings, you will have some sense of how important the problem you are working on is to your customer.

The key for buying model credibility is that the results make sense to the customer. That means that not only will your model need to show great results, but your customer will need to understand how you evaluated your model, relate your evaluation to her business and believe in the results.

There are many available evaluation methods, and their use can provoke controversy within the statistical and data mining communities. Frequently used evaluation methods such as the Receiver Operating Characteristic can be criticised on statistical grounds, with more robust but nuanced (and therefore more difficult to understand) alternatives proposed, and yet even these simple but wrong can be hard for a business audience to understand.

There are also methods such as lift or gain that are tied tightly to the business problem that the customer is trying to solve. In the context of a presentation to sell your results, this type of evaluation is ideal, where the data and problem are suitable. For example lift is explicitly tied to the marketing goal of increasing sales.

If it is not suitable, developing a metric that fits the problem may be a way forward. In either case, it is still best practice to also perform the evaluation using a statistically robust method to ensure a correct assessment of the model’s performance without necessarily using that method to communicate the results (I hope over the next few weeks to compare a couple of evaluation methods for classifiers at my blog).

Now, although for a few shining minutes it was okay to produce a model that was very accurate without knowing how it came to its conclusions. Certainly you can still win Kaggle with a model of this kind. Unfortunately, though, it can be very difficult to convince a user to trust a model with an accuracy score alone. On the one hand, many metrics used to evaluate models are inaccessible to people who themselves are not machine learning users. On the other, for many people an accuracy score isn’t convincing on its own behalf — such people wonder, ‘will it work with next tranche of data?’ and ‘is it too good to be true?’

Ticking the box for presenting a model with interpretations the customer can understand is the crux of ensuring your customer believes in your model. It’s too important to treat as a footnote here, so I save it for the next installment.

See my blog landing page for details of other stories to come in this series

--

--