Marketing Attribution with Markov

How Cloudera uses Markov models to solve the multi-channel attribution problem

James Kinley

Published in

Towards Data Science

6 min readAug 3, 2020

An edited version of this article was first published on ClickZ: Marketer’s guide to data-driven marketing attribution.

Marketing attribution is a way of measuring the value of the campaigns and channels that are reaching your potential customers. The point in time when a potential customer interacts with a campaign is called a touchpoint, and a collection of touchpoints forms a buyer journey. Marketers use the results of an attribution model to understand what touchpoints have the most influence on successful buyer journeys, so that they can make more informed decisions on how to optimise investment in future marketing resources.

Buyer journeys are rarely straightforward and the paths to success can be long and winding. With so many touchpoints to consider it is difficult to distinguish between the true high and low impact interactions, which can result in an inaccurate division of credit and a false representation of marketing performance. This is why choosing the best attribution model for your business is so important.

In this post, I provide some insight into how Cloudera has used Cloudera products to build a custom, data-driven attribution model to measure the performance of our global campaigns.

Limitations of traditional models

All attribution models have their pros and cons, but one drawback the traditional models have in common is that they are rules based. The user has to decide up front how they want the credit for sales events to be divided between the touchpoints. Traditional models include:

Luckily there are more sophisticated data-driven approaches that are able to capture the intricacies of buyer journeys by modelling how touchpoints actually interact with buyers, and each other, to influence a desired sales outcome. A data-driven model provides marketers with deeper insight into the importance of campaigns and channels, driving better marketing accountability and efficiency.

Cloudera’s data-driven approach

The first attribution model we evaluated was based on the Shapley value from cooperative game theory. I covered the details of this model in a previous post. This popular (Nobel prize winning) model provided much more insight into channel performance than the traditional approaches, but in its most fundamental implementation it didn’t scale to handle the number of touchpoints we wanted to include. The Shapley model performed well on a relatively small number of channels, but our requirement was to perform attribution for all campaigns, which can equate to hundreds of touchpoints along a buyer’s journey.

Before investing time into scaling out the Shapley algorithm, we researched alternate methods and decided to evaluate the use of Markov models to solve the attribution problem. We used the ChannelAttribution R package for the implementation and found that it produced similar results to the Shapley model, it could scale to a large number of touchpoints, and was easy to set up and use in Cloudera Data Science Workbench (CDSW).

Markov attribution models

Markov is a probabilistic model that represents buyer journeys as a graph, with the graph’s nodes being the touchpoints or “states”, and the graph’s connecting edges being the observed transitions between those states. For example, a buyer watches a product Webinar (first state) then browses to LinkedIn (transition) where they click on an Ad impression for the same product (second state).

The key ingredient to the model is the transition probabilities (the likelihood of moving between states). The number of times buyers have transitioned between two states is converted into a probability, and the complete graph can be used to measure the importance of each state and the most likely paths to success.

For example, in a sample of buyer journey data we observe that the Webinar touchpoint occurs 8 times, and buyers watched the webinar followed by clicking on the LinkedIn Ad only 3 times, so the transition probability between the two states is 3 / 8 = 0.375 (37.5%). A probability is calculated for every transition to complete the graph.

Before we get to calculating campaign attribution, the Markov graph can tell us a couple of useful nuggets of information about our buyer journeys. From the example above you can see that the path with the highest probability of success is “Start > Webinar > Campaign Z > Success” with a total probability of 42.5% (1.0 * 0.425 * 1.0).

The Markov graph can also tell us the overall success rate; that is, the likelihood of a successful buyer journey given the history of all buyer journeys. The success rate is a baseline for overall marketing performance and the needle for measuring the effectiveness of any changes. The example Markov graph above has a success rate of 67.5%:

Campaign attribution

A Markov graph can be used to measure the importance of each campaign by calculating what is known as the Removal Effect. A campaign’s effectiveness is determined by removing it from the graph and simulating buyer journeys to measure the change in success rate without it in place. Removal Effect is a proxy for weight, and it’s calculated for each campaign in the Markov graph.

Using Removal Effect for marketing attribution is the final piece of the puzzle. To calculate each campaign’s attribution value we can use the following formula: A = V * (Rt / Rv)

A = Campaign’s attribution value
V = Total value to divide. For example, the total USD value of all successful buyer journeys used as input to the Markov model
Rt = Campaign’s Removal Effect
Rv = Sum of all Removal Effect values

Let’s walk through an example. Say that during the first quarter of the fiscal year the total USD value of all successful buyer journeys is $1M. The same buyer journeys are used to build a Markov model and it calculated the Removal Effect for our Ad campaign to be 0.7 (i.e. The buyer journey success rate dropped by 70% when the Ad campaign was removed from the Markov graph). We know the Removal Effect values for every campaign observed in the input data, and for this example let’s say they sum to 2.8. By plugging the numbers into the formula we calculate the attribution value for our Ad campaign to be $250k:

$250,000 = $1,000,000 * (0.7 / 2.8)

In addition to this, we calculate campaign ROI by subtracting the cost of running a campaign over the same period of time from its attribution value.

What’s nice about the ChannelAttribution R package is it does all of this for you and even includes implementations for three of the traditional rules-based algorithms for comparison (first-touch, last-touch, and linear-touch). Theres a new Python implementation too.

Cloudera on Cloudera

We’re proud of our data practice at Cloudera. The marketing attribution application was developed by Cloudera’s Marketing and Data Centre of Excellence lines of business. It’s built on our internal Enterprise Data Hub and the Markov models run in Cloudera Data Science Workbench (CDSW).

By leveraging a data-driven attribution model we have eliminated the biases associated with traditional attribution mechanisms. We have been able to understand how various messages influence our potential customers and the variances by geography and revenue type. Now that we have solid and trusted data behind attribution, we’re confident in using the results to inform and drive our marketing mix strategy and investment decisions. And we can rely on the numbers when we partner with sales teams to drive our marketing strategies going forward.