The world’s leading publication for data science, AI, and ML professionals.

How to Make Real-Time Machine Learning Work for User Journeys

Why combining in-session and historical behavior is the right approach for B2C organizations

Aron Visuals | Unsplash
Aron Visuals | Unsplash

The onsite user experience is the most important touchpoint between a digital business and its customers. That’s why leading B2C organizations often turn to dynamic decisioning strategies to optimize their user journeys. A dynamic decisioning engine powered by Machine Learning (ML) can help product teams meet consumers’ increasing demand for personalization, while also guiding them toward meaningful business outcomes like conversions, engagement, and more.

Real-Time Machine Learning can enhance dynamic decisioning by automatically adjusting to new information throughout each user’s session. Compared to batch inference, updating predictions in real-time can offer two substantial benefits: (1) more accurate models, and (2) the ability to target anonymous and first-time users. However, there are many different forms of Real-Time ML, and not all are well-suited towards optimizing user journeys. In this post, we make the case for why B2C organizations looking to maximize the value of their dynamic user experiences should utilize a particular form of Real-Time ML: real-time inference powered by a combination of historical and in-session data.

Advantages of Real-Time Inference

Once a machine learning model has been trained, using that model to make a prediction is, in most cases, a two step process:

  1. Feature Engineering – Processing raw data into more predictive and machine-readable variables that the model was trained on. Examples of features include changes in user behavior over time, demographic information, and the time-of-day.
  2. Inference – Feeding those features through the model to generate a prediction.

For example, consider a machine learning pipeline designed to predict how a promotional discount will impact each user’s likelihood to convert to a paid subscription. The raw data gathered for this use case may consist of behavioral events – clicks, logins, pageviews, etc. But in order to train a high quality model, these events must first be transformed into higher-level features such as: "how many times has the user logged in over the past two weeks?", or "what percent of the user’s sessions occur on a mobile device?". Once training is complete, these same features must be supplied to the model during inference in order to make a prediction.

Most ML systems conduct both feature engineering and inference through offline batch processing on a recurring schedule. The resulting predictions can then be stored and accessed very quickly for onsite dynamic decisioning, but they won’t take into account any information that has accrued since the batch process was run. What’s more, batch predictions are not available for first-time or anonymous users who have no prior behavioral history. This dearth of real-time data is increasingly problematic due to secular pressure on 3rd party cookies.

Real-time inference seeks to fill these gaps by leveraging the most up-to-date information available. In many cases, real-time inference leads to better predictions for existing users (since in-session data is often critical for strong performance), and enables a business to target a broader set of users (i.e. anonymous users). Coupled with dynamic decisioning engines, these benefits can translate to more conversions and revenue for digital product teams.

Implementing Real-Time ML: The Challenge for User Journeys

In the ideal scenario, both feature engineering and inference happen in real-time. As users browse your site, their behavior would be automatically combined with historical events into sophisticated features, and these features would be used to make up-to-date predictions. Practically, however, a tradeoff exists between (1) how much real-time processing can occur, and (2) how quickly predictions can be obtained. The quicker you need results to be returned, the less sophistication you can put behind feature engineering.

For example, consider a feature which describes what percentage of a user’s article views have occurred on category ‘News’ throughout their lifetime as a customer. To re-compute this feature in real-time, one would have to load and query the user’s entire behavioral history. Now consider that an ML model might require thousands of similar features and it’s easy to see how this approach can become infeasible for use cases that require low latency and client-side implementation. Because of this tradeoff, many domains where real-time ML is successfully deployed are those where all the information needed to make an accurate prediction is gathered directly from the client.

For example, consider facial recognition software which secures many of today’s smartphones. Once a model has been trained to recognize certain facial features, predicting whether a given face belongs to the phone’s owner depends only on a picture of that face. Because all the necessary input data is directly available in a machine-readable form and does not require joining with other data sources, the prediction can be made very quickly in real-time. Other examples of Real-Time ML which operate in a self-contained environment are listed below.

For consumer data use cases, however, it’s typically not enough to rely exclusively on info from the current session. Predicting a user’s future behavior often requires a fuller picture of their past behavior. But, loading this data and incorporating it into up-to-date features is often prohibitively slow. As a result, B2C product teams are sometimes forced to choose between two imperfect solutions: use real-time inference based only on data from the current session, or generate batch predictions using only historical data.

A Best-of-Both-Worlds Solution

To navigate this challenge, organizations using Machine Learning for consumer journeys should adopt a hybrid approach. By leveraging a combination of historical features (pre-computed in batch) and in-session features (computed in real-time), businesses can reap the benefits of real-time inference without sacrificing the predictive value of past data.

In this framework (which we refer to as Real-Time Inference with Hybrid Features) each user’s entire behavioral history is accounted for at the time a prediction is made. But in order to sidestep latency issues, data from past and present is not combined into any single feature. Instead, one set of features is built based on the user’s current session, while a separate set of features is built using data from previous sessions. For example, rather than computing a feature such as "how many clicks does the user have in the past 7 days, including this session?", a hybrid feature system would generate two separate features which, taken together, capture the same information –

  1. "How many clicks did the user have in the previous 6 days?" (computed in batch, on the server)
  2. "How many clicks does the user have in the current session today?" (computed in real-time, in the client)

This approach provides the best of both worlds for dynamic user journeys – sophisticated historical features which paint a comprehensive picture of user behavior, up-to-date behavioral signals which facilitate contextual targeting and anonymous user predictions, and low latency (~hundreds of milliseconds) to ensure a responsive user experience.

Example: Dynamic Subscription Paywalls

To illustrate how this framework might work in practice, consider a media organization aiming to drive increased subscriptions through a system of dynamic paywalls. This system is driven by a set of ML predictions which indicate how each user’s likelihood to subscribe will change if shown a hard paywall rather than accessing 3 additional free pieces of content.

Predicting how a user’s behavior will change if shown one paywall experience vs. another is a difficult task. To achieve a strong ROI, the organization needs accurate ML predictions. To generate accurate predictions, the model needs access to a rich set of features. To build these features, we use a hybrid of historical and in-session information.

First, consider the historical features. These might be re-computed daily on a batch schedule, using information collected up until the end of the previous day. For example, at midnight on January 1st, a batch process might run to re-compute thousands of analytics-style features (such as those below) using info collected up until end-of-day on December 31st.

Now let’s say that User X logs onto the site on January 1st. To determine which paywall experience the user should receive, we make a request to generate a new prediction. The user’s batch features are fetched from a high-availability feature store, while a set of in-session features are computed in real-time in the client. Both sets of features are supplied to the model in order to make the best prediction possible.

Conclusion

Real-time inference can dramatically improve the efficacy of dynamic decisioning strategies, but it’s important for B2C organizations to adopt the right real-time implementation in order to realize its benefits. By leveraging a hybrid system of feature engineering which sources from data both past and present, these organizations can generate better predictions for a larger set of users in order to optimize the onsite experience and drive key metrics for their business.


Related Articles