Do You Really Need a Feature Store?

In the majority of cases, a feature store is overkill

Lak Lakshmanan
Towards Data Science

--

It appears that every sophisticated ML team has built a feature store for their ML platform. Uber built Palette. Airbnb built Zipline. Netflix built Time Travel. Google Cloud worked with our customer GoJek to build Feast.

Fortunately, you no longer need to build or manage your own. Google Cloud Vertex AI offers a fully managed feature store as does Sagemaker. There are even companies like tecton.ai dedicated to building cloud-agnostic feature stores. Given all this, it can seem as if feature stores are the data warehouses of machine learning — not only should you be using feature stores, but you should be centralizing your ML platform around a feature store.

Don’t. In most cases, feature stores add unnecessary complexity. There are, however, a few instances in which a feature store will be invaluable.

tldr: Use a feature store if you need to inject features server-side, especially if the method of computing these features will keep improving. Otherwise, it is overkill.

Training Serving Skew

First, let’s look at the problem that feature stores are trying to solve.

One of the major challenges in machine learning is training-serving skew. When an ML model is trained on preprocessed data, it is necessary to carry out the identical steps on incoming prediction requests. This is because we need to provide the model data with the same characteristics as the data it was trained on. If we don’t do that, we will get a skew between training and serving, and the model predictions will not be as good.

There are three ways to ensure that preprocessing done during training is repeated as-is during prediction: by putting the preprocessing code within the model, using a transform function, or using a feature store. Let’s discuss them one-by-one.

1. Within the model

The simplest option is to incorporate the preprocessing steps within the model function itself. For example, it might be carried out within a Lambda layer in Keras. Keras also provides out-of-the-box preprocessing layers. This way, when the model is saved, the preprocessing steps will automatically be part of the model.

Incorporating the preprocessing code into the model function. Image by author

The advantage of this method is the simplicity. No extra infrastructure is required. Preprocessing code is carried along with the model. So, if you need to deploy the model on the edge, or in another cloud, there is nothing special you have to do. The SavedModel format contains all the necessary information.

The drawback to this approach is that the preprocessing steps will be wastefully repeated on each iteration through the training dataset. The more expensive the computation, the more this adds up.

Another drawback is that you have to implement the preprocessing code in the same framework as the ML model. Thus, for example, if the model is written using Pytorch, the preprocessing also has to be done using Pytorch. If your preprocessing code uses custom libraries, this can become difficult.

2. Transform function

The drawback with placing the preprocessing code within the model function is that the code needs to be used to transform the raw data during each iteration of the model training process.

This can be optimized if we capture the preprocessing steps in a function and apply that function to the raw data. Then, the model training can be carried out on the preprocessed data, and so it is more efficient. Of course, we have to make sure to invoke that function from both training and prediction code. Alternatively, we have to capture the preprocessing steps in a container and interpose the container between the input and the model. While this adds efficiency, it also adds complexity — we have to make sure to save the transform function as an artifact associated with the model and know which transform function to invoke.

Encapsulate the preprocessing code into a transform function that is applied to both the raw dataset and to prediction requests. Image by author

Frameworks like Tensorflow Extended (TFX) provide a transform capability to simplify the bookkeeping involved. Some SQL-based ML frameworks like BigQuery ML also support a TRANSFORM clause.

Prefer to use a Transform function over putting the transformation code into the model if the extra infrastructural and bookkeeping overhead is worth it. This will be the case if the feature is computationally expensive.

3. Feature Store

Placing the preprocessing code within the model function or encapsulating it in a transform function (or SQL clause or container) will suffice for the vast majority of features.

There are two situations where these won’t suffice and you will need a feature store. A feature store is a repository for storing and serving ML features. The feature store is essentially a key-value store where the key consists of an entity (e.g. hotel_id) and a timestamp and the value consists of the properties of that entity (e.g. price, number of bookings, number of website visitors to hotel listing over the past hour, etc.) as-of that timestamp.

A feature store is a central repository that provides entity values as of a certain time. Image by author

The first situation where you will need a feature store is if the feature value will not be known by clients requesting predictions, but has to instead be computed on the server. If the clients requesting predictions will not know the feature values, then we need a mechanism to inject the feature values into incoming prediction requests. The feature store plays that role. For example, one of the features of a dynamic pricing model may be the number of web site visitors to the item listing over the past hour. The client (think of a mobile app) requesting the price of a hotel will not know this feature’s value. This information has to be computed on the server using a streaming pipeline on clickstream data and inserted into the feature store.

The second situation is to prevent unnecessary copies of the data. For example, consider that you have a feature that is computationally expensive and is used in multiple ML models. Rather than using a transform function and storing the transformed feature in multiple ML training datasets, it is much more efficient and maintainable to store it in a centralized repository. Be careful about this — the increased efficiency may not be worth the increased complexity.

Don’t go overboard in any of these scenarios. For example, if all the features of the model that will need to be computed server-side are computed in the same way (for example, they are retrieved from a relational database, or computed by a streaming pipeline), it’s perfectly acceptable to have the retrieval code in a transform function or container. Similarly, it is perfectly acceptable to repeat some feature processing a handful of times than to complicate your ML platform with a feature store.

The canonical use of a feature store

The most important use case of a features store is when situations #1 and #2 both apply. For example, consider that you need a “point-in-time lookup” for fetching training data to train a model. Features such as the number of website visitors over the past hour or the number of trips made by a driver in the past hour, etc. are used in multiple models. But they are pretty straightforward in that they are computed by a streaming pipeline and so their real-time value can be part of the data warehouse. Those are relatively easy and don’t always need a feature store.

Now consider an alternative type of feature that is used by many models but also is continually improved — for example, perhaps we have an embedding of a song, artist, and user in a music streaming service. There is a team updating user and song embeddings on a daily basis. Every time the model that consumes this feature is retrained — high commercial value use cases will need to re-train periodically — the training code will need to fetch the values of this feature that align with the training labels and the latest version of the embedding algorithm. This has to be done efficiently and easily across all labels. And this has to be done across the tens or hundreds of features used by the model. The feature store makes periodic model re-training on hard-to-compute, frequently improved features exceptionally useful.

A feature store is particularly useful for hard-to-compute features that are frequently updated, since models will have to be trained on “point-in-time” embeddings. Image by author

Decision Chart

The considerations discussed here are summarized in this decision chart:

Choosing between different options for capturing preprocessing. Image by author

This is not a decision tree to decide whether your organization needs a feature store — there’s probably a handful of features for which you do (and for those features, we’d love it if you used Vertex AI Feature Store). This is a decision tree to decide whether to use a feature store for the particular feature/model you are building.

Here are some concrete situations where you don’t need a feature store. If your feature is

1. Known by the client.
2. In a data warehouse.
3. Not time dependent.
4. Needed by only batch serving.
5. Computationally inexpensive

Keep it simple.

Further Reading

  1. Feature Store is one of the design patterns that we discussed in our book Machine Learning Design Patterns. Even there, we warned against going overboard on this pattern but I’m afraid that Feature Store is becoming what the Visitor pattern was in the Gang of Four book — used inappropriately more often than not.
  2. Use Keras Preprocessing Layers for in-model preprocessing, and the TRANSFORM clause in BigQuery ML for transformations. TFX provides Transform capability for TensorFlow/Keras models.
  3. On Google Cloud, if you need a feature store, use the fully managed Vertex AI feature store.

Thanks to my colleague Anand Iyer for helpful discussions on this topic.

--

--