5x Faster Scikit-Learn Parameter Tuning in 5 Lines of Code
Leverage Tune-sklearn to supercharge and scale scikit-learn’s GridSearchCV.
By Michael Chau, Anthony Yu, Richard Liaw
Everyone knows about Scikit-Learn — it’s a staple for data scientists, offering dozens of easy-to-use machine learning algorithms. It also provides two out-of-the-box techniques to address hyperparameter tuning: Grid Search (GridSearchCV) and Random Search (RandomizedSearchCV).
Though effective, both techniques are brute-force approaches to finding the right hyperparameter configurations, which is an expensive and time-consuming process!
What if you wanted to speed up this process?
In this blog post, we introduce tune-sklearn. Tune-sklearn is a drop-in replacement for Scikit-Learn’s model selection module with cutting edge hyperparameter tuning techniques (bayesian optimization, early stopping, distributed execution) — these techniques provide significant speedups over grid search and random search!
Here’s what tune-sklearn has to offer:
- Consistency with Scikit-Learn API: tune-sklearn is a drop-in replacement for GridSearchCV and RandomizedSearchCV, so you only need to change less than 5 lines in a standard Scikit-Learn script to use the API.
- Modern hyperparameter tuning techniques: tune-sklearn allows you to easily leverage Bayesian Optimization, HyperBand, and other optimization techniques by simply toggling a few parameters.
- Framework support: tune-sklearn is used primarily for tuning Scikit-Learn models, but it also supports and provides examples for many other frameworks with Scikit-Learn wrappers such as Skorch (Pytorch), KerasClassifiers (Keras), and XGBoostClassifiers (XGBoost).
- Scale up: Tune-sklearn leverages Ray Tune, a library for distributed hyperparameter tuning, to efficiently and transparently parallelize cross validation on multiple cores and even multiple machines.
Tune-sklearn is also fast. To see this, we benchmark tune-sklearn (with early stopping enabled) against native Scikit-Learn on a standard hyperparameter sweep. In our benchmarks we can see significant performance differences on both an average laptop and a large workstation of 48 CPU cores.
For the larger benchmark 48-core computer, Scikit-Learn took 20 minutes for a 40,000-size dataset searching over 75 hyperparameter sets. Tune-sklearn took a mere 3 and a half minutes — sacrificing minimal accuracy.*
* Note: For smaller datasets (10,000 or fewer data points), there may be a sacrifice in accuracy when attempting to fit with early stopping. We don’t anticipate this to make a difference for users as the library is intended to speed up large training tasks with large datasets.
Simple 60 second Walkthrough
Let’s take a look at how it all works.
Run pip install tune-sklearn ray[tune]
or pip install tune-sklearn "ray[tune]"
to get started with our example code below.
TuneGridSearchCV Example
To start out, it’s as easy as changing our import statement to get Tune’s grid search cross validation interface:
And from there, we would proceed just like how we would in Scikit-Learn’s interface! Let’s use a “dummy” custom classification dataset and an SGDClassifier
to classify the data.
We choose the SGDClassifier
because it has a partial_fit
API, which enables it to stop fitting to the data for a certain hyperparameter configuration. If the estimator does not support early stopping, we would fall back to a parallel grid search.
As you can see, the setup here is exactly how you would do it for Scikit-Learn! Now, let’s try fitting a model.
Note the slight differences we introduced above:
- a new
early_stopping
variable, and - a specification of
max_iters
parameter
The early_stopping
determines when to stop early — MedianStoppingRule is a great default but see Tune’s documentation on schedulers here for a full list to choose from. max_iters
is the maximum number of iterations a given hyperparameter set could run for; it may run for fewer iterations if it is early stopped.
Try running this compared to the GridSearchCV equivalent.
TuneSearchCV Bayesian Optimization Example
Other than the grid search interface, tune-sklearn also provides an interface, TuneSearchCV, for sampling from distributions of hyperparameters.
In addition, you can easily enable Bayesian optimization over the distributions in TuneSearchCV in only a few lines of code changes.
Run pip install scikit-optimize
to try out this example:
As you can see, it’s very simple to integrate tune-sklearn into existing code. Check out more detailed examples and get started with tune-sklearn here and let us know what you think! Also take a look at Ray’s replacement for joblib, which allows users to parallelize training over multiple nodes, not just one node, further speeding up training.
Documentation and Examples
- Documentation*
- Example: Skorch with tune-sklearn
- Example: Scikit-Learn Pipelines with tune-sklearn
- Example: XGBoost with tune-sklearn
- Example: KerasClassifier with tune-sklearn
- Example: LightGBM with tune-sklearn
*Note: importing from ray.tune
as shown in the linked documentation is available only on the nightly Ray wheels and will be available on pip soon