The Importance of Having a Feature Store

I’ve seen much value gained from building and maintaining a centralized feature store. Feature store is a centralized software library that contains many functions, where each function creates a single feature from a standardized input (data). These features can be later fed into machine learning algorithms aimed to solve different problems.

Ido Zehori
Towards Data Science

--

Photo by Joshua Aragon on Unsplash

Although feature stores play a vital role in data strategy, it’s still difficult to find information about them online. But understanding what feature stores are and why they’re important is crucial, especially in today’s world of increasing data governance and business problems being increasingly solved by machine learning models. Indeed, feature stores should be a fundamental part of your company’s entire machine learning operation.

Among other benefits they offer, three specific advantages of feature stores make them invaluable: they enable the simple reuse of features across the company; they make it simple to standardize feature definitions and naming conventions; they enable businesses to achieve consistency between the models a data scientist develops offline and the models when they are deployed online.

What is a feature store?

Because “store” can have a number of meanings, it’s important to clarify that in the term “feature store,” the word relates to “storage.” The store is actually a centralized software library that contains many functions, where each function creates a single feature from a standardized input (data). These features can be later fed into machine learning algorithms aimed to solve different problems.

When operating machine learning systems at scale, data professionals usually need to engineer large numbers of features in order to train their models. If the model is successful at solving the problem for which it was created and is deployed in production, the exact same features should later be created in the production environment to be fed to the model running in production. A feature store becomes an invaluable resource to data scientists during this process.

Feature stores also allow data scientists to streamline the way features are maintained, paving the way to more efficient processes while ensuring that features are properly stored, documented and tested. Many projects and research assignments across a company use the same features. With a feature store, data scientists can quickly access the features they need and avoid doing repeat work. Feature stores also offer a tested and QAed way to create the feature and know that it’s reliable.

Why do we need feature stores?

There are a few feature-specific challenges data scientists face that the use of feature stores helps alleviate. These include:

  • Features are not reused. A common obstacle data scientists face is spending time redeveloping features when using previously developed features or ones developed by other teams would have sufficed. Feature stores allow data scientists to avoid repeat work.
  • Feature definitions vary. Different teams at any one company might define and name features differently. Moreover, accessing the documentation of a specific feature (if it exists at all) is often challenging. Feature stores address this issue by keeping features and their definitions organized and consistent. The documentation of the feature store helps you create a standardized language around all of the features across the company. You know exactly how every feature is computed and what information it represents.
  • There is inconsistency between training and production features. Production and research environments often use different technologies and programming languages. The data streaming in to the production system needs to be processed into features in real time and fed into a machine learning model. For the modelling effort to be effective, the model developed offline in research needs to provide the exact same prediction as the model deployed online given the same data as input. Having a feature store that is environment agnostic (online and offline) suggests that given the same data, the model will be fed the same feature exactly.‍

Feature store benefits

When a company embraces feature stores, it allows data professionals across teams to follow the same general workflow for any machine learning use case — regardless of the challenges they’re currently addressing (such as classification and regression, time series forecasting etc.). This workflow is typically implementation-agnostic, which means it can be easily adopted for use with new algorithm types and frameworks, such as classical ML algorithm alongside the newer deep learning frameworks.

Another major benefit of using feature stores is the time savings it creates. The stage in any modelling effort where features are created tends to be the most time-consuming; this sensitive process requires that features be calculated correctly, with thousands of features being created at a time and computed in a production environment in the exact same way they were computed offline during research. The use of a feature store makes the process of creating features much more streamlined and efficient.

My recommendation: a centralized feature store

My team have gained much value from building and maintaining a centralized feature store where different data professionals across the company can each create and manage canonical features to be used by other members of the team. This allows data scientists to easily add features they’ve built into a shared feature store. Once features are there, they are easy to consume both online (in production) and offline (in research), simply by referencing a feature’s simple canonical name.

Today, we have thousands of features in our feature store that are used in a variety of machine learning projects across the company and across all domains. Our data scientists are adding new features all the time, with new features calculated automatically and updated daily. This has allowed our team members to avoid repeat work, and easily access a wealth of data they need for modelling and research purposes.

For more info and useful information please visit the Bigabid technical blog!

--

--

Creating business impact with Data Science and Machine Learning. Leading the Data Science @ BigaBid.