The world’s leading publication for data science, AI, and ML professionals.

Saver journeys: momentum, deep engagement & dynamic segments

We take a second look at Jupiter's early data, segmenting users and trying to predict what moves someone from a low to high propensity…

Copyright Jupiter Savings. Creative Commons Attribution 4.0 International License.
Copyright Jupiter Savings. Creative Commons Attribution 4.0 International License.

Synthesis:

  • App engagement, saving momentum and rewards allow ~90% accuracy in predicting savers’ segment dynamics.
  • Savers resist easy classification, but focusing on their journeys, not their static points, leads to structure and meaning that can be acted on.
  • Unfounded priors about inherent user characteristics or "big data" may be unhelpful, especially when applying data science in early days.

Finding segments that are both robust and useful

In a previous article, we took a look at Jupiter’s data on triggering users to save more. In this one, we look at what we can tell about savers’ overall behaviour, or segmenting and Clustering, for Jupiter’s first ~1,000 savers.

Step one is a traditional "RFM" segmentation – "recency" (how recent did someone save), "frequency" (how often did they save), and "monetary" (what’s their balance).

We divide all our savers into quartiles and then combine the labels, so "111" means "top quartile recency, top quartile frequency, top quartile monetary", and so on. Often when you do this, one or two segments will be dominant (111 if things are going well, 444 if you’re in trouble). Here’s our plot:

High heterogeneity in standard segmentation techniques
High heterogeneity in standard segmentation techniques

At least our biggest segment is 111, but that’s a lot of dispersion. We could group 111 and 112 as "frequent, high value", and then the 443 and 442 as "infrequent, low value", but at most that would catch ~30% of users.

Savers are not falling into nice buckets or a few personas

After this, we did a lot of wrangling. We added in behaviour data and ran a bunch of analyses (K-Means, HDBScan, t-SNE, etc). Most came up the same as RFM – lots of dispersion, no intuitive structure. Eventually, though, we combined two methods and started making progress:

(1) Applying principal component analysis (PCA) to find the primary source of variation in the streams of user events. We could explain a big chunk of the variance among users just by how often they opened the Jupiter app and engaged with "snippets" (in-app pieces of financial literacy), and how ** often they saved and redeemed a boost. Cross-correlations reinforced this finding. Engagement is a meaningful axis of variatio**n.

(2) Making one high-level division of savers into those that saved more than once a month and those saving once or less, merging in the event counts per user, and running a clustering algorithm (HDBScan) within each of those larger pools. That division allowed us to find a good fit with just 2–3 cluster points in each of the two pools, for five segments overall.

When joined back together and plotted against the log of balances, we had finally found our way to a clustering that represented both the high-dimensional behavioural data and showed a clear progression in value. The five segments emerged from the data, not our priors, so we couldn’t give them cute names, but we could start asking good questions about them.

Savers plotted by frequency of save and balance, coloured by value cluster
Savers plotted by frequency of save and balance, coloured by value cluster

Using rich data to make the segmentation dynamic

A segmentation is only useful if it leads to action. So we asked, can we find an aspect of past saver behaviour that can predict what segment they fall into now? From there, which of our levers will be most effective in affecting that behaviour, and hence in moving savers into more valuable segments?

To start, we ran a variety of models, using as input features the event counts for each user, and their final segment as the target label. Unfortunately, none of the models had accuracy much above 0.5 – a coin toss. Not wholly surprising, given a small data set with complex internal structure.

But here having a rich data set, built and structured to feed into these analyses easily and smoothly, became invaluable. Every facet of a saver – events, balances, behaviour – is for us the aggregate of a stream of real-time events. With a bit of legwork we could slide both the segmentation and the event counts along a timeline almost arbitrarily.

We used this ability to assemble the data every 3 days over a period of 3.5 months, from early June to late September. That took us from a few hundred data points to close to fifteen thousand.

We still didn’t get much lift in our models when we confined ourselves to the single-time-point question, "can we predict a user’s current segment?" That was to some extent unsurprising. Data augmentation is much less helpful than a naive idea of "big data" suggests if the augmentation is close to duplication. For a question about static, cross-sectional structure, just moving the cross section is very close to duplication (see the projection plots below).

t-SNE plots for limited data vs augmented: the 2-d space has filled up, but no clear structure has emerged
t-SNE plots for limited data vs augmented: the 2-d space has filled up, but no clear structure has emerged

So we changed the question, to be about evolution in time, a question both more valuable and more tractable.

We asked: what predicts whether a saver will "upgrade" their segment in the next month? From a question about where a user is, we had a question about where a user is going. At each time in our expanded data set we looked ahead for a month, and checked for movement in the user’s segment. The fundamental data point became the stage in a saver’s journey.

That did not make the augmentation wholly loss-less, but did make it significantly better. For the question of understanding you at rest, you today and you a month ago are more or less the same; for the question of understanding how your saving habits are evolving, you today and you a month ago are about different enough to find additional meaning.

Engagement + momentum drive increases in value

The proof is in the pudding – the model metrics. On our augmented data, we ran models to predict whether a given saver, on a given day, with everything known about them at that time, would bump up a segment in the next month. Using just simple models, accuracy immediately jumped to 70%, with good false positive/false negative rates.

Since the dataset was quite well balanced we decided to wheel in some heavy machinery, and fed the data in Google Cloud’s AutoML. It fitted a fairly complex gradient-boosted tree, and reached 90% accuracy. We worried about leaking labels, but the cross-correlations, training curves and feature importance scores all reassured us.

So which features are most important?

First, momentum. The two features with the highest importance (25% and 15%), were the saver’s balance quartile and their recency quartile. So, being able to save a lot (in raw amount), does matter – it would frankly be suspicious if it had no importance – but only accounts for about a quarter of the likelihood someone will move up a segment. And it is followed closely by recency. Taking these together as a rough measure of "momentum" explains about 40% of a saver’s future dynamic.

Second, engagement, especially deep engagement. The next feature in importance was how often the saver opened the app. ** Then – and this surprised us – how often the user viewed the archive of in-app messages. Right after that came how often the user tapped the financial literacy "nuggets"** in the centre of the screen.

These can be interpreted as how often the user didn’t just open the app, but felt a reason to explore it. Each accounted for ~10% of total feature importance. When combined with a few other, related features (such as sharing referral codes or exploring "saving buddies"), deep engagement explained in total another 30% of variation.

Finally, emotional reward and savings habits. Echoing findings in our prior article, how frequently a user saved and whether they had redeemed a boost accounted for another 15% of feature importance. The final 10% was scattered across a range of other features.

Feature clusters and importance in final model
Feature clusters and importance in final model

What about preventing savers from shifting down in the value segments? The momentum and engagement factors were as important. But they were joined by outbound communication. That is, users were less likely to shift down (via a withdrawal or going dormant) if we had recently sent them messages or made them boost offers, even if they hadn’t deeply engaged with the app. This was a relief, as we’d wondered if bothering people, or just reminding them too often that they had this savings pot, would sometimes induce withdrawals. Note: Not that we took this as a license to spam.

Conclusions: On the use (or not) of old mental models and big vs rich data

What can we say overall? First, there is some reinforcement for the findings in our first piece – behaviour is malleable, and emotion and engagement matter, not just for consumption apps, and not just for trading and crypto.

There are two other, broader points though. It’s sometimes surprising how often we are asked, "what demographics are you targeting?" On the one hand, it’s an understandable question if just an initial shortcut. And at one level, age will make a difference to saving – career paths just make the income available to save and the pressures to do so different at different ages.

But on closer inspection, the question can represent an unfounded prior. Worse, it can be a poor shortcut. To caricature: Why try and understand the nuances of saver behaviour if you can do an Excel pivot on age and gender and think it’s coherent? Simplistic data tools lead to simplistic ideas that become the priors that then dominate future analyses.

We have age data for Jupiter savers. We have not found it to usefully explain saver behaviour. We did not need it to get to 90% accuracy on predicting segment movement, or 70%+ accuracy on inducement response rates, or to find meaning in data that resisted any simple segmentation.

If demographics matter, they will show up as structure – so just look for structure. If they don’t, they are an imposition of false coherence, created by simple tools or big but limited data.

A related note is on how to judge data. All else equal, more data is better (much better) than less data. But all else is almost never equal.

Several of us on the Jupiter team use new, digital-only banks ("neobanks"). I opened my account this year when I moved countries (very early 2020). I use it as my main bank account. Mere transaction counts make that obvious. But the app keeps using valuable screen space to offer me help "migrating to us as your primary account". It also keeps offering me discounts on products and services I obviously already have, given the references on my transactions.

This neobank has 5 million plus users, so it has "big data", but either the data is very poor or is very badly used or both, so that data is of no more practical use than if they had 50 data points.

With that, this will be the last of our data posts for a while. We hope this post and the last one have given a good glimpse into what we’re building. Hopefully they’ve also made it clear how much can be done with rich data, even early – so long as you leave your priors at the door, and use the full power of modern data methods.

Most of all, we hope they’ve shown that some old ideas about saving – it’s all about defaults, you can’t make it fun, just reduce friction, and so forth— are at the least open to serious question. Even if it creates competition for us, we hope that others will soon move beyond the already-tired modern Fintech toolkit for saving (a goal here, a round-up there) and get much closer to understanding savers’ behaviour and helping them change it.


Related Articles