You can easily find yourself set up for failure. Verifying and checking metrics before you start was one of the most valuable lessons I learned (the hard way).

Destined for failure
This is an issue that I’ve seen a few times over the years and have run afoul of early in my career. It’s also one of the most important lessons I ever learned.
It’s common knowledge that a large proportion of Data Science projects fail. This is often attributed to working models never making it into production and the difficulties around MLOPs and machine learning engineering. Before I continue, if you’re wanting a super simple guide to walking through the end-to-end process I wrote a post on this a while back that goes through the basics. You can check out here:
It’s not going to land you an MLE role any time soon but if you’re just starting out it can really boost your confidence and demystify some of the steps.
Now, although MLOps is a huge barrier for many data science projects, I think there’s actually another significant reason many projects stumble. In fact, during my consultancy days I spotted this more than once in organisations we’d worked with that had denounced their whole trial into data science a failure.
I’m acutely aware of this issue as it caught me out early on in my career. Luckily, there’s a pretty straightforward fix for it that we’ll go through shortly, but first, a story…
The parable of the eager data scientist

When I finished my doctorate I was super keen to become a data scientist. I’d spent years building computational models of complex laser systems and quantum mechanical processes – even going so far as to play with reinforcement learning for laser design! I was fortunate enough to find a role at a really early-stage startup in Glasgow that had a lot of promise. Unfortunately, that promise didn’t translate to sales and in less than a year I had to move on.
I landed in an insurance company, again a startup but far further along than my first role – we were actually making a profit! I had been brought into the pricing team to look at improving the all-powerful pricing model that sat at the core of the business.
There was a particular input to do with how people ultimately paid for their insurance. I won’t go into the details here but there were essentially four outcomes. Two were neutral, one meant the business made extra profit above what was anticipated, one meant the business made less money than anticipated.
So, new in the door I was told this might be a particularly tricky problem and something they wanted to tackle with data science. Eager to please I put my hand up and got stuck in.
It was a great problem, well defined, and there was loads of data – here in the UK if you sign up as a supplier to the big four price comparison websites you get roughly 1,000,000 quotes a day for car insurance, each of which had (at the time) approximately 900 fields per quote.
I spent some time doing research into other models and approaches across the industry, extracted expertise from the subject matter experts in our organisation, and collaborated with colleagues and the community alike. I eventually came to a pretty reasonable model with a ROC AUC of 0.82 for predicting someone’s likelihood to fall into the negative outcome category.
This is when things got difficult.
When plugged into the big pricing mechanism, the optimisation started to give results that didn’t agree with what the SMEs knew. I got grilled and the modelling torn to bits.
Back to the drawing board.
I went through several iterations of this, without much significant change. Each new approach and all the additional research kept bringing me to similar results. As an early career data scientist only in my second job I started to worry.
Starting to feel quite stuck, I asked to be shown the exact processes used by the SMEs to come to their conclusions. What models were they using that gave results so wildly different?
That’s when it all became clear.
When I was shown how things were done and I started to drill into how things were calculated previously it was apparent that we were measuring completely different things. In fact, they were so different it would have been impossible for me to land at a satisfactory result.
After some time and consultation, my model eventually surpassed the existing approach and led to some counterintuitive findings – the optimisation at the core of the pricing model actually favoured those in the negative outcome category over others as they had some not-so-obvious longer-term behaviours that significantly increased their lifetime value!
Apples to apples

So what’s the point in all of that?
Ever since that episode I set a new rule for myself should I ever start a project to improve an existing model or process:
Recalculate the scoring metric for the existing model using the same process you intend to measure your model on
This might seem straightforward to many but it’s often easily overlooked. I’ve seen this be the downfall of entire teams where they struggle to make real improvements because their rigorous approach can’t compete with a sloppy some fantasy metric that management hold dear for an existing process. You know the kind, the spreadsheet full of fudge factors and tweaks that’s been running the company for years, or data that’s cobbled together and ‘massaged’ into shape manually each month.
By not holding the wider business to the same standards as your own approaches, you start at a significant disadvantage.
Couple this with the resistance you can often find from people eager to prove data science is a waste of time and you may never even get off the ground.
Now it must be said that, in some cases, it’s not actually possible to recreate the existing approach/understanding accurately or rigorously. That’s OK. All you need to do in this circumstance is outline this limitation from the start. if possible suggest a proxy or similar calculation to compare your progress against.
As a data scientist it is your job to educate the wider organisation in becoming more data-driven.
Done properly, this can really make you stand out as a positive force for change across the business and bring people on the journey with you.
Summary
Be careful you’re not attempting the impossible by ensuring you (or someone in the organisation) can recreate the metric against which you’ll be measured.
It’s hard to know how far you’ve come if you don’t know where you started.