Hands-on Tutorials
Use of Hyperparameter Tuning utilities, defined in sklearn, for Deep Learning models developed in Keras has been a challenge; especially for models defined using the Keras API. Scikeras, however, is here to change that. In this article we explore creating a wrapper for non-sequential model(Capsnet) with multiple inputs and multiple outputs (MIMO estimator), and fitting this classifier with GridSearchCV.

A bit about hyperparameters and tuning
If you are familiar with Machine Learning, you must have heard of hyperparameters. Readers acquainted with sklearn, Keras and hyperparameter tuning in sklearn, can skip this part. For the link to github repo scroll to the end. To give a refresher anyways, hyperparameters are a set of properties of any machine learning or deep learning model that the users can specify to change the way a model is trained. These are not learnable (the nomenclature for learnable properties is parameters or weights), i.e., they are user-defined. Often, hyperparameters control the way the model is trained, for example, learning rate (α) or the type of regularization used.
Hyperparameter Tuning/Optimization is one of the crucial steps in designing a Machine Learning or Deep Learning model. This step often demands considerable knowledge of how the model is trained and how the model applies to the problem being solved, especially when done manually. Moreover, manual tuning puts an overhead on the Data Scientist for keeping tab of all the hyperparameters they may have tried. This is where automated hyperparameter tuning with the help of scikit-learn(sklearn
) comes into play.

Scikit-learn provides multiple APIs under Sklearn.model_selection
for hyperparameter tuning. But, the caveat with using sklearn is, it is largely used for Machine Learning models only – there are no deep learning models defined in the API. Fortunately, Keras API, which is popularly used among the practitioners of Deep Learning for defining and training Deep Learning models in a simplified manner, has sklearn wrapper classes for Deep Learning models defined in Keras. What this meant is that, one can write one’s own Deep Learning model in Keras, and then convert it into a sklearn-like model using these wrappers.
Sounds great so far, right? Well… not so fast. The wrappers defined under Keras(or tensorflow.keras
for that matter), until now, can wrap your model either as a classifier ( KerasClassifier
) or a regressor ( KerasRegressor
). Moreover, if you wanted to wrap a model defined using Keras Functional API, i.e., not a sequential model [Read more about Sequential vs Functional API in Keras], that was not possible either. So, this was a limitation when one wanted to tune the hyperparameters of a more complicated deep learning model using the sklearn APIs (and the reason why I am so excited to write this article.)
A primer on tf.keras
wrappers:
For those unfamiliar with the wrappers, the use of wrappers is illustrated in a code example below. We define get_model()
function that returns a compiled Keras model. The model is then wrapped into clf
using KerasClassifier
.The clf
created in the example has all the attributes and members of a sklearn
classifier and can be used as such.
Enter Scikeras
SciKeras is the successor to
tf.keras.wrappers.scikit_learn
, and offers many improvements over the TensorFlow version of the wrappers.
Scikeras offers many much awaited APIs that enable developers to interface their tensorflow models with sklearn, including Functional API based models as well as subclassed Keras models. For a full list of new offerings, refer this. The package can be easily installed with a simple pip install, and wrappers imported from scikeras.wrappers.
pip install scikeras
from scikeras.wrappers import KerasClassifier, KerasRegressor
These wrappers are largely backwards compatible with KerasClassifier
or KerasRegressor
if they already being used in your code, except for the renaming of _build_fn
parameter as model
_.
clf = KerasClassifier(build_fn=get_model,...) #Old
clf = KerasClassifier(model=get_model,....) #New
Another change to take note for hyperparameter tuning using these wrappers is _defining tunable parameters in get_model
with a default value is not encouraged_. Users are instead expected to declare all tunable arguments to the get_model
function as keyword arguments to the wrapper constructor.
#def get_model(param_1=value_1, param_2=value_2,...): -> Discouraged
def get_model(param_1, param_2,...):
...
...
return model
clf = KerasClassifier(build_fn=get_model, param_1=value_1, param_2=value_2, ...)
clf = KerasClassifier(build_fn=get_model, model__param_1=value_1, model__param_2=value_2, ...)
Appending model__
before the arguments also reserves the parameters to be passed to the get_model
function (see Routed Parameters). Few more changes to the code may be needed depending on whether categorical_cross_entropy
is used, and the way fit
is called (refer the complete list). We will not delve into details of those implementations.
Multiple Inputs / Multiple Outputs
Scikit-Learn natively supports multiple outputs, although it technically requires them to be arrays of equal length (see docs for Scikit-Learn’s
[MultiOutputClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html#sklearn.multioutput.MultiOutputClassifier)
). Scikit-Learn has no support for multiple inputs.
Many non-trivial Deep Learning models used in research and industry have either multiple inputs or multiple outputs, or both. Such models can be easily described and trained in Keras. However, using such models in sklearn becomes a challenge, since, sklearn expects the X
and y
of a model to be a single n-dimensional numpy array (multiple arrays of same length allowed for the y
). Now, the concatenation to a single array can be straightforward if all of the inputs/ouputs are of the same shape. However, this can quickly get messy when the inputs and outputs have different shapes, as is the case with a CapsNet model (more on this later).
In order to have multiple inputs and/or multiple outputs for a model, SciKeras allows the use of custom data transformers. The examples given in the official documentation, for achieving this with input and/or output lists with arrays of unmatching shapes, employ a reshaping of the inputs/outputs from an array of shape[E_dim1,E_dim2,E_dim3,...]
to [E_dim1, E_dim2*E_dim3*...]
, where E
can either be input or output, effectively reshaping all the inputs to a 2-dimensional numpy array.
These custom transformers, depending on whether it is used for transforming X
(features) or y
(targets), can then be used from a custom estimator to override either scikeras.wrappers.BaseWrappers.feature_encoder()
or scikeras.wrappers.BaseWrappers.target_encoder()
, respectively. Moreover, for models with multiple outputs, defining a custom scorer is advisable, especially when the outputs have different shapes or use different metrics.
CapsNet
At the risk of oversimplifying, CapsNet is a novel architecture proposed by Geoffrey Hinton et al. in late 2017, where they designed a network that could perform without the use of Pooling layers. This is achieved by using capsules, which perform a form of ‘inverse rendering‘, which is learnt by dynamic routing-by-agreement. For this tutorial we will not be going into the theory of CapsNet – those interested in theory can read this article for a working understanding, and refer to the original paper [1] for more details.

What we are interested in is the implementation of the Capsule Network, and its overall architecture, since, that is what we want to wrap into scikeras. The implementation used in this tutorial is based off of the code made available openly by Xifeng Guo. The illustration shows the high level version of the architecture implemented, showing the approximate flows of inputs and outputs.
Design aspects covered in this implementation
- The Capsule Layers need to be defined by the user or imported.
- Dynamic Routing of Capsules via routing-by-agreement defines a custom flow of data within the model (implemented in the user-defined Capsule Layer)
- The outputs are not of the same type – One-Hot-Encoded(OHE) vector and flattened image – instead of both being labels(for classifiers) or continuous values(for regressor).
Designing the Wrapper
Building up on our discussion so far, the wrapper would need to override both BaseWrappers.feature_encoder()
and BaseWrappers.target_encoder()
. Depending on the type of transformation required, we could either resort to writing our custom transformer, or use one of the many transformers that are already offered in [sklearn.preprocessing](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing)
. For this tutorial, we will demonstrate both the ways of transformation – we will write a custom transformer for the outputs and use a library transformer for the inputs.
Further, since the mechanism of training of the Keras model can not be strictly mirrored with that of a classifier or regressor (due to the reconstruction module), we will sub-class the BaseWrapper
while defining our estimator. Moreover, for the performance comparison of the model we need to consider two outputs – hence, a custom scorer will also be needed.
Output Transformer
For our specific implementation, the outputs needed by the Keras model has to be in the form [y_true, X_true], while sklearn expects a numpy array to be fed as targets array. The transformer we define needs to be able to interface seamlessly between the two. This is achieved by fitting the transformer to the outputs in fit
method, and then usingtransform
__ method that reshapes the output into a list of arrays as expected by Keras, and an inverse_transform
method that reshapes the output as expected by sklearn.
We create our custom transformer MultiOutputTransformer
, by sub-classing or inheriting from BaseEstimator
and TransformerMixin
classes of sklearn, and define a fit
method. This method could be used to incorporate multiple library encoder, (like LabelEncoder
, OneHotEncoder
), into a single transformer, as demonstrated in the official tutorial, depending on the type of outputs. These encoders can be fit to the inputs so that the transform
and inverse_transform
methods can work appropriately. In this function, it is necessary to set the self.n_outputs_expected_
parameter to inform scikeras about the outputs from fit
, while other parameters in meta can be optionally set. This function must return self
.
In the code presented here, however, I have tried to demonstrate the implementation when there is no transformation needed for the targets except for a possible separation and a rearrangement. It should be noted that it would be possible to define a FunctionTransformer
over an identity function to achieve this as well (which is demonstrated in next section).
The get_metadata
function is optionally defined for model_build_fn
where meta
parameter is accepted. Specific to this code, the transform
method is straightforward, in the inverse_transform
method, we need to define our custom inverse transformation, since we do not have any library encoders to rely on.
Input Transformer
For the input transformer, we will use a library transformer already available in sklearn.preprocessing
– the FunctionTransformer
. For the FunctionTransformer
, it is possible to define a lambda
function into the func
parameter of transformer constructor. But, having a lambda
function could cause issues with pickle
. So, we instead define a separate function to pass into FunctionTransformer
.
MIMO Estimator
To finish up the wrapper, we subclass BaseWrapper
as mentioned previously, and override feature_encoder
, scorer
, and target_encoder
functions. Note that, in the scorer
function, we only evaluate the output from the Capsules layer, since this is the metric on which we would want our cross-validation epochs to optimize the network.
Hyperparameter Tuning with MIMOEstimator
The next steps are pretty similar to the first example using the wrappers in tf.keras
. We instantiate MIMOEstimator
using get_model
and pass the (hyper)parameters to get_model
as routed parameters (with model__
prefix). These routed arguments also include those hyperparameters that we would like to tune using grid-search.
Next, we define the params
dict containing the hyperparameters list and the corresponding values to try out as key-value pairs. We use the clf
as a estimator to create GridSearchCV
object, and then fit it to the data.
Care must be taken while specifying the cv
argument for the GridSearchCV
to achieve a suitable relation between the number of training examples (n), the batch size(b), and the number of cross-validation batches (cv)— n should be completely divisible by cv *b.
The results of the grid-search are accumulated in gs_res
after the fit operation. The best estimator can be obtained using best_estimator_
attribute of gs_res
, similarly, the best_score_
gives the best score, and best_params_
gives the best fit of hyperparameters.
So, there it is – how we can write a custom wrapper with minimal coding to use Keras models in conjunction with sklearn API. Hope you found it helpful. If you have any suggestions or questions, please tell me about them in the comments section, especially if there is a usecase/model where this wrapping fails. You can find the full code implementation below with a few more resources.
Code and other resources
Full code of this implementation can be found _[[[here](https://github.com/naturomics/CapsLayer)](https://gist.github.com/nairouz/5b65c35728d8fb8ec4206cbd4cbf9bea)](https://keras.io/api/layers/core_layers/masking/)_. A tutorial on custom Keras Layers can be found here and here. Implemented CapsNet layers can be found here.
Academic References
[1] Sabour S, Frosst N, Hinton GE, Dynamic routing between capsules(2017), Advances in neural information processing systems 2017 (pp. 3856–3866)