How can you avoid losing money to your company?

Using machine learning models to create targets for marketing campaigns is probably one of the tasks with the highest "looks good on my test set" / "oh, it didn’t work in the real world" ratio.
Let’s set the scene. You have been hired as a Data Scientist by a digital sweet shop company. Your task is to optimize marketing campaigns, specifically mailout or push notification campaigns where you typically award a coupon in exchange for buying a product. In a fictitious example, let’s say this is a fine Belgian Chocolate Box, with a 12$ margin. You offer a 10$ coupon for the product. As a Data Scientist, you have built a well-performing Machine Learning model that identifies the customers that are more likely to buy the product.
Your model is a binary classifier that looks at the customers that have purchased the Box in the last 12 months by clicking on a banner on your website and compare them to those who have been on your website but didn’t click on the banner. It produces a probability B to be a buyer given a set of features x, then ranks non-buyers from highest to least likely to be a buyer. Those kinds of models are typically known as marketing propensity models [1].
You run the first campaign. In order to test the model on a live scenario, you randomly select customers from all probability deciles and split them into an 80% target and 20% control group. The target will receive the nudge (email + coupon) whilst the control will be left alone.

Once the campaign is over you are anxious to check results. Did the model work? As per Fig 2, it seems so! Your model was correct in predicting that the customers you considered more likely to buy were indeed the ones with the highest success rate (buyer / total decile customers). The customers in the top decile were 3 times more likely to buy than the customers in the third decile, and so on.

Let’s see how this worked out in economic terms. Given the success rate, the cost of the coupon, and the product margin, our campaign produced an Incremental Margin of around 7200$. Not too bad indeed. If you were to repeat this every month you could anticipate an extra profit of 86,000$.

At this point, you are definitely the hero in the room. But then you decide to check the performance on the Control Group given the probability decile.

True, your model is good at predicting who will buy the product, and in fact, a similar performance is observed in the control group. However, the campaign achieved the highest lift vs control group in the middle and lowest probability deciles. A look at the economics of the campaign will make things clearer (hopefully).

You include data on Control Group Size and Control Success Rate. With this extra piece, it’s possible to calculate a No Campaign Margin in $, which is equivalent to the margin you would observe given the control group success rate and given that no voucher was awarded. The key metric is the Net Gain, which is the difference between the Incremental Margin and the No Campaign Margin. As we can see the Net Gain is strongly negative for the first decile, which means that for those customers we were better off not run the campaign. Things start changing from the 5th decile on. Overall you go from celebrating a 7k profit to having to explain an 18k loss instead.
Simply put, you were solving the wrong thing. Predicting who will buy our Belgian Chocolate Box is not the same as estimating who will buy our Belgian Chocolate Box given the nudge. You stumbled into deal-chasers who would have bought the product anyway. As a matter of fact, the task should have been framed as a causality problem rather than a prediction problem.
Borrowing the words of Radcliffe & Surrey [1]:
most targeted marketing activity today; even when measured on the basis of incremental impact, is targeted on the basis of non-incremental models.
How can you fix this?
To minimize the risks of giving money away in unproductive campaigns you should consider running initiatives in two steps, provided you have a large enough customer base (how large? open question) and marketing software that can support a degree of customization.
Step one: run a pilot by random sampling from your propensity deciles. Split your target into equal size target and control group in order to have enough data to get statistically significant results.
Step two: shortly after the pilot, analyze results. Find the propensity deciles (or use any other clustering method you might have in place) that have a remunerative lift vs control group.
Step three: execute your campaign on the eligible customer base save the customer you sampled for the pilot. This time you can allow a much smaller control group. The campaign should start a few weeks after the pilot. Clearly, you cannot use the conclusion of a pilot you made over Christmas for a campaign in July (overfitting is your biggest enemy here).

Going back to the campaign economics, this is equivalent to blocking the coupon for the not profitable probability deciles (the one in purple).

Had you had done that, you would have seen an actual Net Gain of 7000$. Slightly less than what you initially anticipated but definitely much more than the loss you made.
What comes next?
Finding the right target for a campaign is hard and sometimes cumbersome. A key takeaway is that it must be treated as a continuous experiment, as long as your customer base changes and interacts with your products. When you have budget and contact pressure constraints it becomes a data-intensive activity that should be supported by the right analytical capabilities and marketing automation software.
Machine Learning can become a powerful trick up your sleeve, by properly using supervised or unsupervised propensity models or by leveraging Causal Inference techniques such as the incremental uplift models [2]. Those could intervene in the evaluation phase between a pilot and the actual campaign, replacing the simpler approach to identify the no-coupon customers that I presented above.
In the next article I will present the X-Learner, a causal inference uplift model identifying customers that are more likely to make your desired action as an effect of your campaign. Maybe you won’t get rid of deal-chaser for good, but sure you will give them a hard time.
References
[1] Radcliffe N. and Surry P. "Real-world uplift modelling with significance-based uplift trees." White Paper TR-2011–1, Stochastic Solutions (2011)
[2] Rodrigues J. Applied Data Science Techniques for Actionable Consumer Insights. Addision-Wesley (2020). The section on causal inference methods is where first I came across the idea of uplift modelling.