Feature Engineer Optimization in HyperparameterHunter 3.0

Published in

Towards Data Science

10 min readAug 6, 2019

Automatically save and optimize your feature engineering steps alongside hyperparameters with HyperparameterHunter — making optimization smarter and ensuring no Experiment is wasted

Pre-HyperparameterHunter demo of fitting feature engineering into hyperparameter optimization. Photo: Robyn Mackenzie

The long wait is over. HyperparameterHunter 3.0 (Artemis) has arrived, adding support for Feature Engineering, and it comes bearing gifts!

Gift #1) Clear and customizable Feature Engineer syntax: a list of your own functions
Gift #2) Consistent scaffolding for building Feature Engineering workflows, which are automatically recorded
Gift #3) Optimization for Feature Engineering steps, and (obviously) detection of past Experiments to jump-start optimization
Gift #4) Your sanity and time: stop keeping track of janky lists of Feature Engineering steps and how they work with all your other hyperparameters

1. Feature Engineering Background

1.1. What Is It?

Lots of people have different definitions for feature engineering and preprocessing, so how does HyperparameterHunter define it?

We’re working with a very broad definition for “feature engineering”, hence the blurred line between itself and “preprocessing”. We consider “feature engineering” to be any modifications applied to data before model fitting — whether performed once on Experiment start, or repeated for every fold in cross-validation. Technically, though, HyperparameterHunter lets you define the particulars of “feature engineering” for yourself, which we’ll see soon. Here are a few things that fall under our umbrella of “feature engineering”:

1) Manual feature creation, 2) Scaling/normalization/standardization,
3) Re-sampling (see our imblearn example), 4) Target data transformation, 5) Feature selection/elimination, 6) Encoding (one-hot, label, etc.),
7) Imputation, 8) Binarization/binning/discretization

… And lots of other stuff!

1.2. Why Should I Care?

A fair question since Feature Engineering is rarely a topic in hyperparameter optimization. So why should you want HyperparameterHunter to keep track of Feature Engineering?

First, Feature Engineering is important.

You almost always need to preprocess your data. It is a required step.
— Machine Learning Mastery

Second, we usually treat feature engineering steps like hyperparameters anyways — just hyperparameters we’re accustomed to manually tuning.

Should I use StandardScaler or Normalizer? I'll test both and desperately try to remember which (if either) is best for each algorithm. Should I one-hot-encode dates into days of the week, or make a binarized "is_weekend" feature? Do I care about the month? The year? Should I translate the 12 months into four seasons, instead? What about leap years??

In the end, each of the numerous feature engineering steps we use is really just another hyperparameter we should be optimizing — but we're not. Why?

There’s actually a pretty decent reason for feature engineering’s absence from common hyperparameter optimization: It’s hard. It’s not exactly like picking a value between 0.1 and 0.7, or choosing between using a sigmoid or ReLU transformation in a NN layer. We’re talking about parametrizing and optimizing a collection of functions — that require who-knows-what and return whatever — all in the service of transforming your precious data.

Have you ever thrown together a script to do all your feature engineering, then just dragged it around — for your whole project — unceremoniously adding, removing and modifying pieces as necessary? You’re not alone. By the end of a project, it’s impossible to recreate Experiments since there’s no clear, automated record of feature engineering performed for them. Furthermore, ignoring feature engineering renders hyperparameter optimization completely unreliable. Surely, there must be a better way… And there is!

2. HyperparameterHunter’s Approach

Before we jump in with HyperparameterHunter, let’s take a quick look at our data: SKLearn’s Boston Housing Regression Dataset. We’ll be using the “DIS” column as the target, just like SKLearn’s target transformation example. This dataset has a manageable 506 samples, with 13 features excluding the target.

2.1. Baseline

Because the goal of feature engineering is to produce better models, let’s establish a baseline CVExperiment. As always, we’ll start by setting up an Environment to broadly define the task and how to evaluate results.

We’ll do KFold cross-validation with five splits, and we’ll focus just on median absolute error.
Also, because we’re not cavemen, we’ll tell Environment to set aside a holdout_dataset from the train_dataset via SKLearn's train_test_split.

Then we'll run a simple CVExperiment with AdaBoostRegressor and its default parameters to see how we do without the flashy, new FeatureEngineer.

2.2. Definitions

Having established our baseline MAE of 0.51 for out-of-fold predictions, let’s look at a few Feature Engineering steps we can take to chip away at it.

2.2.A. Manual Feature Creation
Since we’re creative and we like having fun with Feature Engineering, we’ll add our very own feature to our input data, first. Let’s make a feature that is the Euclidean norm, or ℓ2-norm, of the 13 other features! In the spirit of creativity, let’s creatively name our Euclidean norm function euclidean_norm:

2.2.B. Input Scaling
Next, we have to do input scaling because it’s what all the cool kids are doing these days. In all seriousness though, scaling your data is usually a good idea.
Remember to fit_transform with our train_inputs, then just transform our non_train_inputs (validation/holdout data) to avoid data leakage.

2.2.C. Target Transformation
Our last Feature Engineering step will use SKLearn’s QuantileTransformer to uniformly distribute our target outputs, thereby spreading out the most frequently occurring values and reducing the impact of outliers. As with our input scaling, we must take care to fit_transform only our train_targets, then transform the non_train_targets.

2.3. When Do We Get to HyperparameterHunter?

I know what you’re thinking, and I’m thinking the same thing. Enough stalling. Show me how to do all that in HyperparameterHunter.

Confession time: I may have tried to sneak past without mentioning that our clear and concise functions above are all we need to make FeatureEngineer.

But, Hunter, the syntax for defining Feature Engineering steps was so smooth and logical! I never imagined that HyperparameterHunter would expect them in exactly the same format that I already use! That’s insane! But how???
— You, probably

Well, dear reader, the secret ingredient is in the function signatures above, specifically the input arguments. We call these functions EngineerStep functions because each one makes an EngineerStep. A FeatureEngineer, then, is simply a list of EngineerSteps or functions.

Back to the secret ingredient. An EngineerStep function is just a normal function in which you do whatever data processing you want. You only need to tell it the data you want in the signature’s arguments. Astute readers may have already noticed the pattern in our EngineerStep functions above, but here’s a handy formula to remember valid EngineerStep function arguments.

Buckle up. This math is pretty advanced, but don’t worry; I’m a professional…

Just take one string from the first set, stick one of the strings from the second set to it, and you’ve got a valid EngineerStep function argument. The other important part is what the function returns. Fortunately, that’s even easier to remember. Return the new values of your signature arguments. You can also optionally return a transformer to perform inverse target transformations, as we did with quantile_transform above. But wait! There’s more!

We also have two aliases to combine data for easier processing, which we already used in our functions above! Let’s update our highly-complex and nuanced formula to add the bonus EngineerStep function argument aliases:

As the new arguments’ names imply, “all_inputs”/“all_targets” give you a big DataFrame of all of your datasets’ inputs/targets. “non_train_inputs” and “non_train_targets” are similar, except they leave out all training data. The notes below each formula are reminders that “test_inputs” does not have a targets counterpart argument, since we don’t track test targets by design.

3. Diving In

Armed with our newfound understanding of how to make our own FeatureEngineer steps, let’s start using FeatureEngineer with CVExperiment.

We just need the feature_engineer kwarg in CVExperiment, or any OptPro's forge_experiment method. feature_engineer can be aFeatureEngineer instance, or a list of EngineerSteps/functions like the ones we defined above.

3.1. Test the Waters: Experimentation

Remember our baseline Experiment finished with a median absolute error of 0.51 for our OOF data. Let’s test the waters with a few FeatureEngineer-enhanced CVExperiments to see what happens...

Well that was easy…
Let’s absorb what just happened. Three different CVExperiments, each with a different FeatureEngineer. Experiment #1 performed about as well as our baseline. #2 was a bit better. Then in #3, we saw a decent reduction in error from 0.51 to 0.46. Perhaps we can call quantile_transform the only important Feature Engineering step and go home! But how can we be sure?

3.2. Face First: Optimization

To those that skipped the section “3.1. Test the Waters”, I also like to live dangerously. While using FeatureEngineer in CVExperiment was awesome, it’d be even better to just let HyperparameterHunter's OptPros handle testing all the different combinations of Feature Engineering steps for us!

Now, you may be worried that adding optimization is bound to complicate Feature Engineering. Well you can relax because we only need our OptPros’ forge_experiment method, which is exactly like initializing a CVExperiment!

To search over a space of different EngineerSteps, simply put the steps inside Categorical, just like standard hyperparameter optimization! Categorical also has an optional kwarg for those times when the mad scientist in us all wants to try a particularly questionable EngineerStep. If optional=True (default=False), the search space will include not only the categories explicitly given, but also the omission of the current EngineerStep entirely.

Before we can do Feature Optimization, we’ll need some more EngineerStep functions to optimize. Perhaps we want to try some other scaling methods, aside from standard_scale, so let's define min_max_scale and normalize.

That’s probably enough hype. Let’s see Feature Optimization in action!
Notice that in classic HyperparameterHunter fashion, our OptPro below automatically figures out that our four Experiments above are compatible with our search space and uses them as learning material to jump-start optimization.

Blue rectangle added around scores of new Experiments conducted by OptPro

Whenever OptPros find an Experiment with scores better than our current best, it helpfully colors the score in pink, and its hyperparameters in green. With 16 Experiments, our OptPro is just warming up, but quantile_transform continues to look promising. Additionally, it seems that searching through some different scalers might be paying off, as our new best Experiment uses the recently-added min_max_scale, rather than standard_scale.

4. Back to Our Roots

Some nerd (me) + my friends, who won’t appreciate how sick this burn is

Since this is HyperparameterHunter and hunting down optimal hyperparameters is kind of our whole thing, let’s get back to our roots. We’re gonna mix our new Feature Optimization techniques with some classical hyperparameter optimization, because no one wants to be stuck with local optima.

In addition to adding classical hyperparameter optimization, let’s pretend to be extremely confident that euclidean_norm is important (although it doesn’t actually seem to be) and make it a required EngineerStep by removing the Categorical enclosing it. Note that this change means our OptPro will learn from only 8 saved Experiments of the 16 total candidates because we’re restricting its search space.

quantile_transform continues to outperform no target transformation at all, but let’s add some real competition with power_transform.

We can also get a second opinion by switching from BayesianOptPro to RandomForestOptPro (or any other OptPro). Looking at all our Experiments above, it seems normalize isn’t doing too well, so let’s get rid of it. In fact, let’s say that we definitely want either standard_scale or min_max_scale, so we’ll drop normalize from the mix and remove the optional=True part at the end of our second EngineerStep. Perhaps we were also a bit overzealous deciding that euclidean_norm was such a big deal, so let’s make that first EngineerStep optional again. Of course, we also need to add our new power_transform as a choice for our last EngineerStep.

In summary, the OptPro below will modify all three EngineerSteps above, and we’re going to try out RandomForestOptPro for a change of pace.

Despite changing the space of our entire FeatureEngineer and even getting a new OptPro to run the show, we had no problem identifying the 16 matching Experiments out of 26 saved candidates that could be used to jump-start optimization. I’d say that’s a lot better than starting over from scratch.

Even better, we have a new best Experiment with a much-improved 0.37 MAE, down from our baseline of 0.51 without any Feature Engineering.

Now Fly, You Magnificent Peacock!

The magical part of these results is that they all get saved locally to your computer, which means you’ll get to keep using them for days, weeks, years, even generations to come! Ok, maybe not the last part.
The point is that when you move away from this toy problem and start building models that take hours to train, why would you settle for things like re-running the same model again, or robbing optimization of the valuable information provided by past Experiments, or having to keep track of all these ridiculous hyperparameters and Feature Engineering steps manually?

Spread your wings! Let HyperparameterHunter take care of all that annoying stuff, so you can stop struggling to keep track of everything, and spend your time actually doing machine learning.

If you still haven’t had enough HyperparameterHunter, check out our repo’s README to get started quickly, or our many examples, or this excellent article (pre-HH-3.0) by Javier Rodriguez Zaurin on putting ML into production.