The world’s leading publication for data science, AI, and ML professionals.

Scikeras Tutorial: A MIMO Wrapper for CapsNet Hyperparameter Tuning with Keras

A tutorial explaining how to wrap Keras Functional API based models with custom architecture for use in sklearn with hyperparameter tuning.

Hands-on Tutorials

Use of Hyperparameter Tuning utilities, defined in sklearn, for Deep Learning models developed in Keras has been a challenge; especially for models defined using the Keras API. Scikeras, however, is here to change that. In this article we explore creating a wrapper for non-sequential model(Capsnet) with multiple inputs and multiple outputs (MIMO estimator), and fitting this classifier with GridSearchCV.

Photo by yinka adeoti on Unsplash
Photo by yinka adeoti on Unsplash

A bit about hyperparameters and tuning

If you are familiar with Machine Learning, you must have heard of hyperparameters. Readers acquainted with sklearn, Keras and hyperparameter tuning in sklearn, can skip this part. For the link to github repo scroll to the end. To give a refresher anyways, hyperparameters are a set of properties of any machine learning or deep learning model that the users can specify to change the way a model is trained. These are not learnable (the nomenclature for learnable properties is parameters or weights), i.e., they are user-defined. Often, hyperparameters control the way the model is trained, for example, learning rate (α) or the type of regularization used.

Hyperparameter Tuning/Optimization is one of the crucial steps in designing a Machine Learning or Deep Learning model. This step often demands considerable knowledge of how the model is trained and how the model applies to the problem being solved, especially when done manually. Moreover, manual tuning puts an overhead on the Data Scientist for keeping tab of all the hyperparameters they may have tried. This is where automated hyperparameter tuning with the help of scikit-learn(sklearn) comes into play.

Photo by Scott Graham on Unsplash
Photo by Scott Graham on Unsplash

Scikit-learn provides multiple APIs under Sklearn.model_selection for hyperparameter tuning. But, the caveat with using sklearn is, it is largely used for Machine Learning models only – there are no deep learning models defined in the API. Fortunately, Keras API, which is popularly used among the practitioners of Deep Learning for defining and training Deep Learning models in a simplified manner, has sklearn wrapper classes for Deep Learning models defined in Keras. What this meant is that, one can write one’s own Deep Learning model in Keras, and then convert it into a sklearn-like model using these wrappers.

Sounds great so far, right? Well… not so fast. The wrappers defined under Keras(or tensorflow.kerasfor that matter), until now, can wrap your model either as a classifier ( KerasClassifier) or a regressor ( KerasRegressor). Moreover, if you wanted to wrap a model defined using Keras Functional API, i.e., not a sequential model [Read more about Sequential vs Functional API in Keras], that was not possible either. So, this was a limitation when one wanted to tune the hyperparameters of a more complicated deep learning model using the sklearn APIs (and the reason why I am so excited to write this article.)

A primer on tf.keras wrappers:

For those unfamiliar with the wrappers, the use of wrappers is illustrated in a code example below. We define get_model()function that returns a compiled Keras model. The model is then wrapped into clfusing KerasClassifier .The clf created in the example has all the attributes and members of a sklearn classifier and can be used as such.


Enter Scikeras

SciKeras is the successor to tf.keras.wrappers.scikit_learn, and offers many improvements over the TensorFlow version of the wrappers.

Scikeras offers many much awaited APIs that enable developers to interface their tensorflow models with sklearn, including Functional API based models as well as subclassed Keras models. For a full list of new offerings, refer this. The package can be easily installed with a simple pip install, and wrappers imported from scikeras.wrappers.

pip install scikeras
from scikeras.wrappers import KerasClassifier, KerasRegressor

These wrappers are largely backwards compatible with KerasClassifieror KerasRegressorif they already being used in your code, except for the renaming of _build_fn parameter as model_.

clf = KerasClassifier(build_fn=get_model,...) #Old
clf = KerasClassifier(model=get_model,....)   #New

Another change to take note for hyperparameter tuning using these wrappers is _defining tunable parameters in get_model with a default value is not encouraged_. Users are instead expected to declare all tunable arguments to the get_modelfunction as keyword arguments to the wrapper constructor.

#def get_model(param_1=value_1, param_2=value_2,...): -> Discouraged
def get_model(param_1, param_2,...):
    ...
    ...
    return model
clf = KerasClassifier(build_fn=get_model, param_1=value_1, param_2=value_2, ...)
clf = KerasClassifier(build_fn=get_model, model__param_1=value_1, model__param_2=value_2, ...)

Appending model__ before the arguments also reserves the parameters to be passed to the get_model function (see Routed Parameters). Few more changes to the code may be needed depending on whether categorical_cross_entropy is used, and the way fit is called (refer the complete list). We will not delve into details of those implementations.


Multiple Inputs / Multiple Outputs

Scikit-Learn natively supports multiple outputs, although it technically requires them to be arrays of equal length (see docs for Scikit-Learn’s [MultiOutputClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html#sklearn.multioutput.MultiOutputClassifier)). Scikit-Learn has no support for multiple inputs.

Many non-trivial Deep Learning models used in research and industry have either multiple inputs or multiple outputs, or both. Such models can be easily described and trained in Keras. However, using such models in sklearn becomes a challenge, since, sklearn expects the Xand y of a model to be a single n-dimensional numpy array (multiple arrays of same length allowed for the y). Now, the concatenation to a single array can be straightforward if all of the inputs/ouputs are of the same shape. However, this can quickly get messy when the inputs and outputs have different shapes, as is the case with a CapsNet model (more on this later).

In order to have multiple inputs and/or multiple outputs for a model, SciKeras allows the use of custom data transformers. The examples given in the official documentation, for achieving this with input and/or output lists with arrays of unmatching shapes, employ a reshaping of the inputs/outputs from an array of shape[E_dim1,E_dim2,E_dim3,...] to [E_dim1, E_dim2*E_dim3*...] , where Ecan either be input or output, effectively reshaping all the inputs to a 2-dimensional numpy array.

These custom transformers, depending on whether it is used for transforming X (features) or y (targets), can then be used from a custom estimator to override either scikeras.wrappers.BaseWrappers.feature_encoder() or scikeras.wrappers.BaseWrappers.target_encoder() , respectively. Moreover, for models with multiple outputs, defining a custom scorer is advisable, especially when the outputs have different shapes or use different metrics.

CapsNet

At the risk of oversimplifying, CapsNet is a novel architecture proposed by Geoffrey Hinton et al. in late 2017, where they designed a network that could perform without the use of Pooling layers. This is achieved by using capsules, which perform a form of ‘inverse rendering‘, which is learnt by dynamic routing-by-agreement. For this tutorial we will not be going into the theory of CapsNet – those interested in theory can read this article for a working understanding, and refer to the original paper [1] for more details.

High Level CapsNet architecture Implementation
High Level CapsNet architecture Implementation

What we are interested in is the implementation of the Capsule Network, and its overall architecture, since, that is what we want to wrap into scikeras. The implementation used in this tutorial is based off of the code made available openly by Xifeng Guo. The illustration shows the high level version of the architecture implemented, showing the approximate flows of inputs and outputs.

Design aspects covered in this implementation

  • The Capsule Layers need to be defined by the user or imported.
  • Dynamic Routing of Capsules via routing-by-agreement defines a custom flow of data within the model (implemented in the user-defined Capsule Layer)
  • The outputs are not of the same type – One-Hot-Encoded(OHE) vector and flattened image – instead of both being labels(for classifiers) or continuous values(for regressor).

Designing the Wrapper

Building up on our discussion so far, the wrapper would need to override both BaseWrappers.feature_encoder() and BaseWrappers.target_encoder() . Depending on the type of transformation required, we could either resort to writing our custom transformer, or use one of the many transformers that are already offered in [sklearn.preprocessing](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing) . For this tutorial, we will demonstrate both the ways of transformation – we will write a custom transformer for the outputs and use a library transformer for the inputs.

Further, since the mechanism of training of the Keras model can not be strictly mirrored with that of a classifier or regressor (due to the reconstruction module), we will sub-class the BaseWrapper while defining our estimator. Moreover, for the performance comparison of the model we need to consider two outputs – hence, a custom scorer will also be needed.

Output Transformer

For our specific implementation, the outputs needed by the Keras model has to be in the form [y_true, X_true], while sklearn expects a numpy array to be fed as targets array. The transformer we define needs to be able to interface seamlessly between the two. This is achieved by fitting the transformer to the outputs in fitmethod, and then usingtransform __ method that reshapes the output into a list of arrays as expected by Keras, and an inverse_transform method that reshapes the output as expected by sklearn.

We create our custom transformer MultiOutputTransformer , by sub-classing or inheriting from BaseEstimator and TransformerMixin classes of sklearn, and define a fit method. This method could be used to incorporate multiple library encoder, (like LabelEncoder, OneHotEncoder), into a single transformer, as demonstrated in the official tutorial, depending on the type of outputs. These encoders can be fit to the inputs so that the transform and inverse_transform methods can work appropriately. In this function, it is necessary to set the self.n_outputs_expected_ parameter to inform scikeras about the outputs from fit, while other parameters in meta can be optionally set. This function must return self .

In the code presented here, however, I have tried to demonstrate the implementation when there is no transformation needed for the targets except for a possible separation and a rearrangement. It should be noted that it would be possible to define a FunctionTransformer over an identity function to achieve this as well (which is demonstrated in next section).

The get_metadata function is optionally defined for model_build_fn where meta parameter is accepted. Specific to this code, the transform method is straightforward, in the inverse_transform method, we need to define our custom inverse transformation, since we do not have any library encoders to rely on.

Input Transformer

For the input transformer, we will use a library transformer already available in sklearn.preprocessing – the FunctionTransformer . For the FunctionTransformer , it is possible to define a lambda function into the func parameter of transformer constructor. But, having a lambda function could cause issues with pickle. So, we instead define a separate function to pass into FunctionTransformer.

MIMO Estimator

To finish up the wrapper, we subclass BaseWrapper as mentioned previously, and override feature_encoder, scorer, and target_encoder functions. Note that, in the scorer function, we only evaluate the output from the Capsules layer, since this is the metric on which we would want our cross-validation epochs to optimize the network.

Hyperparameter Tuning with MIMOEstimator

The next steps are pretty similar to the first example using the wrappers in tf.keras. We instantiate MIMOEstimator using get_model and pass the (hyper)parameters to get_model as routed parameters (with model__prefix). These routed arguments also include those hyperparameters that we would like to tune using grid-search.

Next, we define the params dict containing the hyperparameters list and the corresponding values to try out as key-value pairs. We use the clf as a estimator to create GridSearchCV object, and then fit it to the data.

Care must be taken while specifying the cv argument for the GridSearchCV to achieve a suitable relation between the number of training examples (n), the batch size(b), and the number of cross-validation batches (cv)— n should be completely divisible by cv *b.

The results of the grid-search are accumulated in gs_res after the fit operation. The best estimator can be obtained using best_estimator_ attribute of gs_res, similarly, the best_score_ gives the best score, and best_params_ gives the best fit of hyperparameters.


So, there it is – how we can write a custom wrapper with minimal coding to use Keras models in conjunction with sklearn API. Hope you found it helpful. If you have any suggestions or questions, please tell me about them in the comments section, especially if there is a usecase/model where this wrapping fails. You can find the full code implementation below with a few more resources.

Code and other resources

Full code of this implementation can be found _[[[here](https://github.com/naturomics/CapsLayer)](https://gist.github.com/nairouz/5b65c35728d8fb8ec4206cbd4cbf9bea)](https://keras.io/api/layers/core_layers/masking/)_. A tutorial on custom Keras Layers can be found here and here. Implemented CapsNet layers can be found here.

Academic References

[1] Sabour S, Frosst N, Hinton GE, Dynamic routing between capsules(2017), Advances in neural information processing systems 2017 (pp. 3856–3866)


Related Articles