The world’s leading publication for data science, AI, and ML professionals.

Ml-Driven Marketing Campaigns Targets

Creating ml-driven targets for marketing campaigns is a hard and often poorly understood practice.

How can you avoid losing money to your company?

Photo by Possessed Photography on Unsplash
Photo by Possessed Photography on Unsplash

Using machine learning models to create targets for marketing campaigns is probably one of the tasks with the highest "looks good on my test set" / "oh, it didn’t work in the real world" ratio.

Let’s set the scene. You have been hired as a Data Scientist by a digital sweet shop company. Your task is to optimize marketing campaigns, specifically mailout or push notification campaigns where you typically award a coupon in exchange for buying a product. In a fictitious example, let’s say this is a fine Belgian Chocolate Box, with a 12$ margin. You offer a 10$ coupon for the product. As a Data Scientist, you have built a well-performing Machine Learning model that identifies the customers that are more likely to buy the product.

Your model is a binary classifier that looks at the customers that have purchased the Box in the last 12 months by clicking on a banner on your website and compare them to those who have been on your website but didn’t click on the banner. It produces a probability B to be a buyer given a set of features x, then ranks non-buyers from highest to least likely to be a buyer. Those kinds of models are typically known as marketing propensity models [1].

You run the first campaign. In order to test the model on a live scenario, you randomly select customers from all probability deciles and split them into an 80% target and 20% control group. The target will receive the nudge (email + coupon) whilst the control will be left alone.

Fig 1: On the right the output of our Belgian Chocolate Propensity Model. P(B = 1 | x) is the probability of being a buyer given some features, Rank is the order, probability decile is a cluster label based on the decile the probability belongs to. On the left how we split into target and control groups. Image by the Author.
Fig 1: On the right the output of our Belgian Chocolate Propensity Model. P(B = 1 | x) is the probability of being a buyer given some features, Rank is the order, probability decile is a cluster label based on the decile the probability belongs to. On the left how we split into target and control groups. Image by the Author.

Once the campaign is over you are anxious to check results. Did the model work? As per Fig 2, it seems so! Your model was correct in predicting that the customers you considered more likely to buy were indeed the ones with the highest success rate (buyer / total decile customers). The customers in the top decile were 3 times more likely to buy than the customers in the third decile, and so on.

Fig 2: On the x-axis, the probability decile customer belongs to; on the y-axis the success rate defined as total customers purchasing the product / total customers in that decile. The baseline (the horizontal grey line) is the overall success rate observed on the target group. Image by the Author.
Fig 2: On the x-axis, the probability decile customer belongs to; on the y-axis the success rate defined as total customers purchasing the product / total customers in that decile. The baseline (the horizontal grey line) is the overall success rate observed on the target group. Image by the Author.

Let’s see how this worked out in economic terms. Given the success rate, the cost of the coupon, and the product margin, our campaign produced an Incremental Margin of around 7200$. Not too bad indeed. If you were to repeat this every month you could anticipate an extra profit of 86,000$.

Fig 3. Target Success Rate = total purchaser/decile target size; Campaign Cost in $: Target Size  Target Success Rate  Unitary Coupon Cost. Campaign Margin in $: Target Size  Target Success Rate  Unitary Product Margin. Incremental Margin in $: Campaign Margin in $ **** - Campaign Cost in $. Image by the Author.
Fig 3. Target Success Rate = total purchaser/decile target size; Campaign Cost in $: Target Size Target Success Rate Unitary Coupon Cost. Campaign Margin in $: Target Size Target Success Rate Unitary Product Margin. Incremental Margin in $: Campaign Margin in $ **** – Campaign Cost in $. Image by the Author.

At this point, you are definitely the hero in the room. But then you decide to check the performance on the Control Group given the probability decile.

Fig 4: On the x-axis, the probability decile customer belongs to; on the y-axis the success rate defined as total customers purchasing the product / total customers in that decile. The baseline (the horizontal grey line) is the overall success rate observed on the target group. Image by the Author.
Fig 4: On the x-axis, the probability decile customer belongs to; on the y-axis the success rate defined as total customers purchasing the product / total customers in that decile. The baseline (the horizontal grey line) is the overall success rate observed on the target group. Image by the Author.

True, your model is good at predicting who will buy the product, and in fact, a similar performance is observed in the control group. However, the campaign achieved the highest lift vs control group in the middle and lowest probability deciles. A look at the economics of the campaign will make things clearer (hopefully).

Fig 5: Additional calculated columns to Fig3 are: Control Success Rate= total purchaser in control group/ control size; Success Rate Delta in pp: Target Success Rate - Control Success Rate. No Campaign Margin in $: Control Success Rate  Target Size  Unitary Product Margin; No Campaign Margin in $: Incremental Margin in $ - No Campaign Margin in $. Image by the Author.
Fig 5: Additional calculated columns to Fig3 are: Control Success Rate= total purchaser in control group/ control size; Success Rate Delta in pp: Target Success Rate – Control Success Rate. No Campaign Margin in $: Control Success Rate Target Size Unitary Product Margin; No Campaign Margin in $: Incremental Margin in $ – No Campaign Margin in $. Image by the Author.

You include data on Control Group Size and Control Success Rate. With this extra piece, it’s possible to calculate a No Campaign Margin in $, which is equivalent to the margin you would observe given the control group success rate and given that no voucher was awarded. The key metric is the Net Gain, which is the difference between the Incremental Margin and the No Campaign Margin. As we can see the Net Gain is strongly negative for the first decile, which means that for those customers we were better off not run the campaign. Things start changing from the 5th decile on. Overall you go from celebrating a 7k profit to having to explain an 18k loss instead.

Simply put, you were solving the wrong thing. Predicting who will buy our Belgian Chocolate Box is not the same as estimating who will buy our Belgian Chocolate Box given the nudge. You stumbled into deal-chasers who would have bought the product anyway. As a matter of fact, the task should have been framed as a causality problem rather than a prediction problem.

Borrowing the words of Radcliffe & Surrey [1]:

most targeted marketing activity today; even when measured on the basis of incremental impact, is targeted on the basis of non-incremental models.

How can you fix this?

To minimize the risks of giving money away in unproductive campaigns you should consider running initiatives in two steps, provided you have a large enough customer base (how large? open question) and marketing software that can support a degree of customization.

Step one: run a pilot by random sampling from your propensity deciles. Split your target into equal size target and control group in order to have enough data to get statistically significant results.

Step two: shortly after the pilot, analyze results. Find the propensity deciles (or use any other clustering method you might have in place) that have a remunerative lift vs control group.

Step three: execute your campaign on the eligible customer base save the customer you sampled for the pilot. This time you can allow a much smaller control group. The campaign should start a few weeks after the pilot. Clearly, you cannot use the conclusion of a pilot you made over Christmas for a campaign in July (overfitting is your biggest enemy here).

Fig 6: Before executing the campaign run a small pilot to identify the population likely to purchase your product anyway. Image by the Author
Fig 6: Before executing the campaign run a small pilot to identify the population likely to purchase your product anyway. Image by the Author

Going back to the campaign economics, this is equivalent to blocking the coupon for the not profitable probability deciles (the one in purple).

Fig 7. Same calculations as Fig 5, but this time Unitary Coupon Cost is set to 0 for the first four probability decile (no coupon offered), bringing the business case back to positive. Image by the Author.
Fig 7. Same calculations as Fig 5, but this time Unitary Coupon Cost is set to 0 for the first four probability decile (no coupon offered), bringing the business case back to positive. Image by the Author.

Had you had done that, you would have seen an actual Net Gain of 7000$. Slightly less than what you initially anticipated but definitely much more than the loss you made.

What comes next?

Finding the right target for a campaign is hard and sometimes cumbersome. A key takeaway is that it must be treated as a continuous experiment, as long as your customer base changes and interacts with your products. When you have budget and contact pressure constraints it becomes a data-intensive activity that should be supported by the right analytical capabilities and marketing automation software.

Machine Learning can become a powerful trick up your sleeve, by properly using supervised or unsupervised propensity models or by leveraging Causal Inference techniques such as the incremental uplift models [2]. Those could intervene in the evaluation phase between a pilot and the actual campaign, replacing the simpler approach to identify the no-coupon customers that I presented above.

In the next article I will present the X-Learner, a causal inference uplift model identifying customers that are more likely to make your desired action as an effect of your campaign. Maybe you won’t get rid of deal-chaser for good, but sure you will give them a hard time.

References

[1] Radcliffe N. and Surry P. "Real-world uplift modelling with significance-based uplift trees." White Paper TR-2011–1, Stochastic Solutions (2011)

[2] Rodrigues J. Applied Data Science Techniques for Actionable Consumer Insights. Addision-Wesley (2020). The section on causal inference methods is where first I came across the idea of uplift modelling.


Related Articles