5x Faster Scikit-Learn Parameter Tuning in 5 Lines of Code

Leverage Tune-sklearn to supercharge and scale scikit-learn’s GridSearchCV.

Michael Chau
Towards Data Science

--

By Michael Chau, Anthony Yu, Richard Liaw

Image by author

Everyone knows about Scikit-Learn — it’s a staple for data scientists, offering dozens of easy-to-use machine learning algorithms. It also provides two out-of-the-box techniques to address hyperparameter tuning: Grid Search (GridSearchCV) and Random Search (RandomizedSearchCV).

Though effective, both techniques are brute-force approaches to finding the right hyperparameter configurations, which is an expensive and time-consuming process!

Image by author

What if you wanted to speed up this process?

In this blog post, we introduce tune-sklearn. Tune-sklearn is a drop-in replacement for Scikit-Learn’s model selection module with cutting edge hyperparameter tuning techniques (bayesian optimization, early stopping, distributed execution) — these techniques provide significant speedups over grid search and random search!

Here’s what tune-sklearn has to offer:

  • Consistency with Scikit-Learn API: tune-sklearn is a drop-in replacement for GridSearchCV and RandomizedSearchCV, so you only need to change less than 5 lines in a standard Scikit-Learn script to use the API.
  • Modern hyperparameter tuning techniques: tune-sklearn allows you to easily leverage Bayesian Optimization, HyperBand, and other optimization techniques by simply toggling a few parameters.
  • Framework support: tune-sklearn is used primarily for tuning Scikit-Learn models, but it also supports and provides examples for many other frameworks with Scikit-Learn wrappers such as Skorch (Pytorch), KerasClassifiers (Keras), and XGBoostClassifiers (XGBoost).
  • Scale up: Tune-sklearn leverages Ray Tune, a library for distributed hyperparameter tuning, to efficiently and transparently parallelize cross validation on multiple cores and even multiple machines.
A sample of the frameworks supported by tune-sklearn.

Tune-sklearn is also fast. To see this, we benchmark tune-sklearn (with early stopping enabled) against native Scikit-Learn on a standard hyperparameter sweep. In our benchmarks we can see significant performance differences on both an average laptop and a large workstation of 48 CPU cores.

For the larger benchmark 48-core computer, Scikit-Learn took 20 minutes for a 40,000-size dataset searching over 75 hyperparameter sets. Tune-sklearn took a mere 3 and a half minutes — sacrificing minimal accuracy.*

On left: On a personal dual core i5 8GB RAM laptop using a parameter grid of 6 configurations. On right: On a large 48 core 250 GB RAM computer using a parameter grid of 75 configurations.

* Note: For smaller datasets (10,000 or fewer data points), there may be a sacrifice in accuracy when attempting to fit with early stopping. We don’t anticipate this to make a difference for users as the library is intended to speed up large training tasks with large datasets.

Simple 60 second Walkthrough

Let’s take a look at how it all works.

Run pip install tune-sklearn ray[tune] or pip install tune-sklearn "ray[tune]" to get started with our example code below.

Hyperparam set 2 is a set of unpromising hyperparameters that would be detected by tune’s early stopping mechanisms, and stopped early to avoid wasting training time and resources.

TuneGridSearchCV Example

To start out, it’s as easy as changing our import statement to get Tune’s grid search cross validation interface:

And from there, we would proceed just like how we would in Scikit-Learn’s interface! Let’s use a “dummy” custom classification dataset and an SGDClassifier to classify the data.

We choose the SGDClassifier because it has a partial_fit API, which enables it to stop fitting to the data for a certain hyperparameter configuration. If the estimator does not support early stopping, we would fall back to a parallel grid search.

As you can see, the setup here is exactly how you would do it for Scikit-Learn! Now, let’s try fitting a model.

Note the slight differences we introduced above:

  1. a new early_stopping variable, and
  2. a specification of max_iters parameter

The early_stopping determines when to stop early — MedianStoppingRule is a great default but see Tune’s documentation on schedulers here for a full list to choose from. max_iters is the maximum number of iterations a given hyperparameter set could run for; it may run for fewer iterations if it is early stopped.

Try running this compared to the GridSearchCV equivalent.

TuneSearchCV Bayesian Optimization Example

Other than the grid search interface, tune-sklearn also provides an interface, TuneSearchCV, for sampling from distributions of hyperparameters.

In addition, you can easily enable Bayesian optimization over the distributions in TuneSearchCV in only a few lines of code changes.

Run pip install scikit-optimize to try out this example:

Lines 17, 18, and 26 are the only lines of code changed to enable Bayesian optimization

As you can see, it’s very simple to integrate tune-sklearn into existing code. Check out more detailed examples and get started with tune-sklearn here and let us know what you think! Also take a look at Ray’s replacement for joblib, which allows users to parallelize training over multiple nodes, not just one node, further speeding up training.

Documentation and Examples

*Note: importing from ray.tune as shown in the linked documentation is available only on the nightly Ray wheels and will be available on pip soon

--

--