
Call me crazy, but I’ve challenged myself to create the most extensive guide to Customer Lifetime Value (CLV) out there. Codenamed "everything the other tutorials left out", I’m sharing all the ideas and learnings I gained while working on this topic in a real-world data science team, with imperfect data, and complicated client needs.
My last post featured an ever-overlooked topic: use-cases for historic CLV calculation. It went a little viral, so I guess I’m onto something. In this post I’ll discuss:
- Some essential terminology
- The goal of CLV prediction
- CLV prediction uses (going beyond the standard examples, I promise!)
And in upcoming posts, we’ll talk about CLV calculation and prediction methods, their pros and cons, and lessons learned on how to use them correctly.
There’s loads to cover, so let’s get into it!
Laying the Groundwork
Whether you’re a data scientist, analyst or marketer, you need domain knowledge when embarking on data-driven research projects. So if you’ve made it this far without knowing what Customer Lifetime Value is – or how and why businesses should start with calculating historic CLV – then do visit the last post. I designed it to get you asking the right kinds of questions of your own data, which will help make your prediction efforts, and the actions you can take from them, all the more successful. Enjoy it, and see you back here soon.
Contractual vs Non-Contractual Customer Relationships
Speaking of groundwork, let me clarify two terms I’ll use often here, which describe the type of relationship a retailer may have with its customers:
- A ‘contractual’ relationship, such as a monthly phone or internet contract, is where customers are ‘locked in’. They’ll keep being customers unless their subscription has a planned end date, or they actively cancel.
- A ‘non-contractual’ situation, which most retail relationships are, has no lock-in. Customers can simply stop shopping with a given retailer at any time, either intentionally or even unknowingly, driven by their own changing needs. Shopping at a specific grocery chain, or on Amazon, are common examples.
The goal of CLV prediction
If you work for a retailer with non-contractual relationships, you never know which purchase will be a customer’s last. Even in contractual situations, customers can cancel at any time (if allowed to), or at least, at their next official renewal point. The uncertainty is scary, huh?
But imagine if you could estimate the likelihood that you’re about to lose a customer, or that you already lost them. What if you could predict how many times they’re going to shop with you again, or renew their policies. What if you even knew, in advance, exactly how much they are going to spend? These are the fundamental tasks of CLV prediction, and they all point to one goal: to estimate the value a customer will generate for a retailer over a given, future period.
CLV Prediction Use Cases
If your mind isn’t already exploding with the possibilities that this can unlock for retailers, I’ve got a few ideas for you. As I said last time, don’t forget to think about how can you combine your predicted CLV information with other business data. This can help you unlock even more insights, and enable even more data-driven actions. For example:
Understand your customer base, and serve them better
Last time, I talked about two possible CLV workflows:
- Calculate historic CLV → identify different customer segments
- Identify customer segments → calculate the segments’ historic CLV.
CLV prediction provides the same options. And whichever order you choose, you can then investigate the results, to understand and better meet the specific needs of your customer segments. You especially want to figure out what makes for a high CLV customer, so you can concentrate your efforts on acquiring and serving more customers like them. I listed plenty of ideas last time, based only on using historic data. Now that we’re talking about prediction, using machine learning (ML) algorithms, there are some new possibilities…
- Use ‘model insights’ to explore characteristics of high CLV customers. Certain ML algorithms come with ‘explainability’ features. These help data scientists understand why the model made the predictions it did. A ‘decision tree’ algorithm, for example, learns rules from combinations of its input features: rules like, ‘customers who are registered members, aged under 30, living in city A, who made their first purchase in-store, will spend an average of $327 in the next three months.’ Data scientists and domain experts, such as your marketing team, can try to interpret what these rules mean in the real world. Why is city A important? Do people of a certain income bracket live there? Or are the stores there just better, e.g. with nicer checkouts or a wider product assortment?

- There are also machine learning explainability libraries which can show how different features impacted the model’s predictions. Again, these can be investigated by data scientists and domain experts together. Thinking creatively and critically can help you understand your customer relationships, and find ways to improve them.
Nudge customers along a loyalty journey
- Turn highly-engaged new customers into happy regulars. Imagine that, among your new customers, some have an unusually high predicted CLV. These customers deserve special attention. Alongside your usual welcome emails, you could offer them extra promotions, refer-a-friend bonuses, or even customer satisfaction surveys. I know, surveys, eek! But if you make the process fun, quick, and easy, and show that you value the customer and their opinion, then you might get lucky. Wow them with your commitment to customer satisfaction, and the relationship is off to a good start).
- Turn loyal customers into brand ambassadors. Similarly, long-term customers with a higher predicted CLV than their cohort need extra TLC. You can use similar tactics as just stated, but you’ll probably have to be more thoughtful about it. New customers often expect a flurry of welcome emails, but older customers might need something a little more catchy to get them to stop and read.
Decide which customers to save…
- You might also have some long-term customers whose predicted CLV is starting to drop. If so, you need to figure out why. Have their needs and preferences changed? Have they forgotten about you? Or is it something more? This is where combing CLV data with other data sources is so valuable. Look at the customer’s prior purchase timelines, what they’ve bought, whether they’ve made lots of returns or contacted customer service, and so on. Try to establish whether there’s a poor product-customer fit, or whether the customer just doesn’t see the fit which is there.
- The goal is to figure out whether there’s still potential value for both you and the customer. You could do this by looking for any obvious changes, like a change of customer address, a shift in product category purchased, or dramatic decreases in purchase quantity, frequency or value. These might indicate a real change in the customer’s needs. For example, if they suddenly start buying baby items, they probably have new preferences, and a changed budget. Use this knew knowledge to intuit whether the relationship is worth saving (given the cost of customer acquisition, I’d always assume it’s worth it, unless there’s strong evidence otherwise). Then you can try to fix the situation. Last post I mentioned including sizing recommendations as an example of improving the product fit. Other options could be more targeted, clearer communication, or even providing product recommendations (another data science topic, for another blog).
… and which ones to ‘fire’
- Some customers are not worth keeping. They return most of what they buy, and the costs of this go deeper than just refunding the original purchase price: think, shipping expenses (Cost of Delivery Services), payment fees from PayPal, Mastercard, etc (Cost of Payment Services), and wages to the staff who packed and unpacked the goods, checked their condition, steam-ironed them, and re-shelved them in the warehouse. At some point, their net revenue may even be negative. And if they also place heavy demands on your customer service team – for example, if refund requests are handled manually in your company – then they’re also detracting from the effort you could spend servicing better quality customers. So, if a customer’s CLV prediction is low, and their historic costs are high, you can simply stop engaging with them. I don’t mean ignore their service requests! You can bet they’ll spread bad reviews about you to other people then. But remove them from email and other marketing lists, and let the relationship fizzle out.
Detect and foster business relationships
- Business customers have the potential to spend large amounts, very often, and may be less likely to leave you for other providers (owing to complicated internal purchasing procedures). And if they do leave you, they can take significant revenues with them. Ideally your business customers will make themselves known to you, but when they don’t, CLV could help you detect them. If you predict an unusually high number or value of future transactions for a specific customer, it might be a business. Try to verify this using their purchase history. How do the number and kinds of items or services bought compares to the entire customer population, any other business customers you do know of? You can then reach out to them for confirmation. And once you know they’re a business, you can give them extra focus, offering things like special shipping conditions, personalised service, and relationship management.
… and automate customer relationship management
- Perhaps the greatest benefit of predicting CLV is that it enables you to execute the above use-cases proactively. Historic analyses take time, and don’t always get priority among busy data science or marketing teams. Unfortunately, that means customers can cause the business a lot of costs before anyone notices. Worse still, their purchase frequency might drop off significantly, before you get around to re-engaging them. This can lead to losing that customer altogether.
- Luckily, your data science and engineering teams should be monitoring your in-production CLV prediction models anyway. This means they can set up automated alerts to fire when there’s a CLV risk. A simple way to start is to send an alert whenever a customer’s predicted CLV drops to below a certain monetary threshold. However, this doesn’t take the distribution of typical CLV into account. There are probably a small minority spending big every month, and a large but faithful long-tail, who spend much less. If you try to create a single "CLV risk threshold", you could end up over-reacting to slight spending fluctuations in the long-tail, and missing drop-offs among the big spenders. A more sophisticated approach is to fire an alert whenever an individual’s prediction drops by a certain percentage, compared to their last prediction. Alerts can be routed to marketing or customer service staff, via email or Slack, for example, enabling them to act fast to save the relationship.
- Similarly, you can fire an alert for a predicted CLV increase, as this customer might be ready to be nudged along their loyalty journey towards ambassadorship. You can even set up alerts when certain input features to the model change dramatically.* Increased cost features could indicate a customer which needs to be ‘fired’, while increased units purchased could reveal a new business customer in need of special attention.
- *Another thing your data scientists and engineers should do, anyway.
Anticipate revenue inflows
- To make CLV predictions for a customer’s entire loyalty lifetime is too long of a time-span to be reasonable and reliable. Instead, we usually try to estimate the value a customer will generate over a specific future period, such as the next three months. You can thus anticipate revenue inflows, using the predictions of your entire customer base. Of course, this will be even more accurate if you have a good understanding of your customer acquisition rate, and some good historical Analytics on when in their lifetime customers tend to spend (another topic from last post). This way, even for new customers, for whom a CLV prediction might not yet be very reliable, you can factor in average spending as identified in your historic CLV analyses.

- For contractual situations, anticipating revenue can be even easier. You know how much customers will spend per month; the only thing you have to worry about is whether and when they’ll churn. Spoiler: I’m working on a churn prediction guide, just as over-the-top detailed as this one!
I’m still waiting for the Machine Learning stuff!
You’re right, enough about use cases, it’s time to get technical. So next time I’ll cover methods for CLV analysis, their advantages and disadvantages, and plenty of ‘gotchas’ to be aware of. And in part four, I’ll do the same for probabilistic and Machine Learning methods for CLV prediction. If you want to be updated about that, or all the other Data Science and marketing content I publish, you can follow me on here, Substack or Twitter (I mean, X). You can also check out my LinkedIn Learning course – Machine Learning in Marketing -it’s in German, but the code worksheets are English, and have plenty of comments to make it easy to follow).