Tune the hyperparameters of your deep learning networks in Python using Keras and Talos

Using Talos to grid search Hyperparameter in CNNs, e.g. a dog-cat CNN classifier

Yefeng Xia
Towards Data Science

--

photo by Mario Gogh on Unsplash

With the development of Deep Learning frameworks, it’s more convenient and easy for many people to design the architecture for an artificial neural network. The 3 most popular frameworks, Tensorflow, Keras, and Pytorch, are used more frequently. To improve the performance of our neural networks, there are many approaches, e.g. improve the data quality, using data augmentation. However, data quality is the source of data science. To get better data quality is usually extra expensive, time- and human resource-consuming. Therewith, we prefer to handle the hyperparameters/parameters of a model🏄🏼.

Let’s start it!

1. Parameters or Hyperparameters

A model parameter is a configuration variable that is internal to the model. It’s dependent on the model’s training data. The parameter of a model can be estimated by fitting the given data to the model.

A model hyperparameter is a configuration that is external to the model. The hyperparameters are to help us find model parameters, which are not dependent on training data.

On the contrary, we find the model’s parameter by setting a series of hyperparameters properly, which optimize the training process and make the best use of data. It means that we can manually give hyperparameters values and they can’t be updated during the training. But parameters are something instinct of the model, which updates continuously during the training.

An inappropriate analogy, if we regard a student as a model, his knowledge, characters, skills are more like the model’s parameter. The way we train him to get these abilities and features can be treated as hyperparameters.

Since Hyperparameters are the key to the model’s parameters, we should pay a lot of attention to them. How to select the model’s hyperparameters? To deal with the question requires enough knowledge and patience.

2. Strategies to tune hyperparameters

There are typically 5 different optimization techniques:

  1. Manual Search: we choose some model hyperparameters based on our judgment/experience. We then train the model, evaluate its accuracy and start the process again. This loop is repeated until a satisfactory accuracy is scored.
  2. Grid search: a grid of hyperparameters and train/test our model on each of the possible combinations over a given subset of the hyperparameters space of the training algorithm. It’s the traditional method of hyperparameters optimization.
  3. Random Search: it overrides the complete selection of all combinations by their random selection. Therewith, it can reduce the number of search iterations by selecting some random combination of these hyperparameters.
  4. Bayesian Optimization: is a sequential design strategy for global optimization of black-box functions. It reduces the number of search iterations by choosing the input values bearing in mind the past outcomes.
  5. Evolutionary Algorithm: create a population of N Machine Learning models with some predefined Hyperparameters. It generates some offsprings having similar Hyperparameters to the ones of the best models so that to get again a population of N models. Just the best models will survive at the end of the process by sequentially selecting, combining, and varying parameters using mechanisms that resemble biological evolution. It simulates the process of natural selection which means those species who can adapt to changes in their environment can survive and reproduce and go to the next generation.

3. our own approach: grid search

In our work, we used often grid search. Grid search suffers from high dimensional spaces, but often can easily be parallelized, since the hyperparameter values that the algorithm works with are usually independent of each other. Besides, we write the code on the platform Colab, which allows us to write and execute Python in your browser:

  • Zero configuration required
  • Free access to GPUs
  • Easy sharing

4. Keras and Talos

If you want to quickly build and test a neural network with minimal lines of code, then Keras is what you need. Keras is an open-source neural network library written in Python that is an API designed for human beings, not machines. Since Tensorflow 2 comes up with a tight integration of Keras and an intuitive high-level API tf. keras, there are 2 ways to use Keras, either directly import Keras or from tf import Keras.

Talos was released on May 11, 2018 and has since been upgraded seven times. When running the code with Talos in the scan-command, all possible combinations are tested in an experiment.

Important: Talos radically changes the ordinary Keras workflow by fully automating hyperparameter tuning and model evaluation. Talos exposes Keras functionality entirely and there is no new syntax or templates to learn.

we can install talos with one line of command:

pip install talos

5. Dogs vs. Cats classifier with CNN

To make our result visible and intuitive, we take a simple case to classify whether images contain either a dog or a cat with CNN, the ancient problem in Computer Vision😆. I downloaded the image dataset from Kaggle. The dataset is in the ZIP file format after you download it from the link.

Photo by Hanna Listek on Unsplash
the unzipped dataset in Colab
from google.colab import drivedrive.mount('/content/gdrive/')!mkdir -p dataset!unzip /content/gdrive/My\ Drive/Colab\ Notebooks/blogs_medium/cat_dog.zip -d dataset/

We can use a few lines of code to unzip the file directly in Google Colab.

Here we use LeNet-5, which is the 22-year-old neural network, usually as a teaching sample.

Now we start our code for building LeNet-5 with Keras. To get reproducible results in Keras, setting the random seeds is necessary.

import osimport tensorflow as tfimport numpy as npimport random as python_randomnp.random.seed(42)python_random.seed(42)tf.random.set_random_seed(42)

Then we can focus on the image data. We need to read them with keras.preprocessing.image into train and validation array, which flow in CNN later for training and validation. There must be a uniform size for all pictures, e.g. (100,100,3). Although images of dogs or cats in the dataset are different with size, some big and some small, we can make them with equal-size by resizing.

import kerasimport globimport osfrom keras.preprocessing.image import ImageDataGenerator,load_img,img_to_array,  array_to_imgfrom keras.layers import Dense, Conv2D, MaxPool2D, Flatten,Dropoutfrom keras import optimizersfrom keras.models import Sequentialimport numpy as npimage_size=(100,100)train_cats = glob.glob('dataset/training_set/training_set/cats/*.jpg')train_dogs = glob.glob('dataset/training_set/training_set/dogs/*.jpg')train_files = [fn for fn in train_cats]+[fn for fn in train_dogs]print(len(train_files))train_imgs = [img_to_array(load_img(img, target_size=image_size)) for img in train_files]train_imgs = np.array(train_imgs)print(train_imgs.shape)train_labels= [0 for i in range(len(train_cats))]+[1 for i in range(len(train_dogs))]val_cats = glob.glob('dataset/test_set/test_set/cats/*.jpg')val_dogs = glob.glob('dataset/test_set/test_set/dogs/*.jpg')val_files = [fn for fn in val_cats]+[fn for fn in val_dogs]val_imgs = [img_to_array(load_img(img, target_size=image_size)) for img in val_files]val_imgs = np.array(val_imgs)print(val_imgs.shape)val_labels= [0 for i in range(len(val_cats))]+[1 for i in range(len(val_dogs))]

with the above code, all “dogs” and “cats” are in Array, either train set or validation set. Additionally, we label dogs with digit 1 and cats with digit 0.

Next, we encode categorical integer features 0 and 1 using a one-hot-encoding.

num_classes = 2epochs = 10input_shape = (100,100,3)# encode text category labelsfrom sklearn.preprocessing import OneHotEncoder, LabelEncodertrain_labels_array = np.array(train_labels)le = LabelEncoder()train_integer_encoded = le.fit_transform(train_labels_array)ohe = OneHotEncoder(sparse=False)train_integer_encoded = train_integer_encoded.reshape(len(train_integer_encoded), 1)train_labels_ohe = ohe.fit_transform(train_integer_encoded)validation_labels_array = np.array(val_labels)validation_integer_encoded = le.fit_transform(validation_labels_array)ohe = OneHotEncoder(sparse=False)validation_integer_encoded = validation_integer_encoded.reshape(len(validation_integer_encoded), 1)validation_labels_ohe = ohe.fit_transform(validation_integer_encoded)

The data must be normalized so that the model can be converged faster.

train_imgs_scaled = train_imgs.astype('float32')validation_imgs_scaled  = val_imgs.astype('float32')train_imgs_scaled /= 255validation_imgs_scaled /= 255

Then build a model structure

from keras import layersfrom keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropoutfrom keras.models import Modelfrom keras import optimizersdef lenet_5(in_shape=(100,100,3), n_classes=2):in_layer = layers.Input(in_shape)conv1 = layers.Conv2D(filters=20, kernel_size=5,padding='same', activation='relu')(in_layer)pool1 = layers.MaxPool2D()(conv1)conv2 = layers.Conv2D(filters=50, kernel_size=5,padding='same', activation='relu')(pool1)pool2 = layers.MaxPool2D()(conv2)flatten = layers.Flatten()(pool2)dense1 = layers.Dense(500, activation='relu',kernel_initializer='glorot_uniform')(flatten)preds = layers.Dense(2, activation='softmax',kernel_initializer='glorot_uniform')(dense1)opt = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)model = Model(in_layer, preds)model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])return modelif __name__ == '__main__':model = lenet_5()print(model.summary())
model summary

Here we trained the model for 10 epochs and defined batch_size 200.

from keras.callbacks import ModelCheckpointcheckpoint = ModelCheckpoint("lenet.h5",monitor='val_acc',verbose=1,save_best_only=True, save_weights_only= False, mode ='auto',period=1)history = model.fit(x=train_imgs_scaled, y=train_labels_ohe, validation_data=(validation_imgs_scaled, validation_labels_ohe), batch_size=200, epochs=10, callbacks=[checkpoint], shuffle=True)

After a long time waiting, we can get a training/validation diagram.

after 10 epochs, val_acc 0.7207 and val_loss 0.5841
model acc and loss

The above diagram shows that the model improves not much after 5 epochs. But it hasn’t overfitted. Therefore we can still use the obtained model.

We want to make more efforts to train a better LeNet-5 model for our dog-cat classifier, so we focus on the model’s hyperparameters to improve the model🎑.

6. Talos in LetNet-5 with code

Here we define a new function that has the same structure as LeNet-5, but some hyperparameters in the model are variable. We save these variable hyperparameters in the dictionary “p”.

p = {'first_hidden_layer': [500],'opt': [Adam, sgd],'dropout': [0,0.5],'weight_regulizer':[None],'lr': [1],'emb_output_dims': [None],'kernel_initializer':["glorot_uniform"]}

To reduce the computer calculation and program running time, in the dictionary we have only set ‘opt’ and ‘dropout’ variable, optimizer with 2 options (Adam or sgd) and dropout with 2 possible values. There is a total of 4 combinations.

from keras.optimizers import Adam,sgd
from keras.models import load_model
from keras.utils import CustomObjectScopefrom keras.initializers import glorot_uniformimport talosfrom talos.model.normalizers import lr_normalizerdef lenet_model(x_train, y_train,x_val, y_val, params):in_layer = layers.Input((100,100,3))conv1 = layers.Conv2D(filters=20, kernel_size=5,padding='same', activation='relu')(in_layer)pool1 = layers.MaxPool2D()(conv1)conv2 = layers.Conv2D(filters=50, kernel_size=5,padding='same', activation='relu')(pool1)pool2 = layers.MaxPool2D()(conv2)flatten = layers.Flatten()(pool2)dense1 = layers.Dense(params['first_hidden_layer'], activation='relu')(flatten)dropout1 = layers.Dropout(params['dropout'])(dense1)preds = layers.Dense(2, activation='softmax')(dropout1)model = Model(in_layer, preds)model.compile(loss="categorical_crossentropy", optimizer=params['opt'](lr=lr_normalizer(params['lr'],params['opt'])), metrics=["acc"])steps_per_epoch = int(np.ceil(train_imgs.shape[0] / 20)) - 1history = model.fit(x=train_imgs_scaled, y=train_labels_ohe, validation_data=(validation_imgs_scaled, validation_labels_ohe), batch_size=200, epochs=10, callbacks=[talos.utils.ExperimentLogCallback('kgt', params)], verbose=1)return history, modelt = talos.Scan(x=train_imgs_scaled, y=train_labels_ohe, model=lenet_model, experiment_name= 'kgt', params=p)

With the help of the Scan command (talos.Scan), we start configuring the experiment. It will last a longer time than training the last basic LeNet-5 model.

process bar during the training experiments

The experiment reports are saved in csv. file format. we can read the csv. file to show the results in the table.

experiment reports in table
Top: val_acc plot. Bottom: val_loss plot

By plotting the validation_accuracy (top) and validation_loss (bottom), we can conclude that the trained models of the zeroth and third experiments are much better than those of the second and fourth experiments. Comparing the experimental parameter information, we found the models with adam have a better performance. The dropout method played a little roll in training LeNet-5.

All things considered, the model 0 has the best performance, which uses Adam but doesn’t have a dropout.

7. Conclusion

In this story, we introduced how to use talos to tune hyperparameters of a with Keras built CNN. In the beginning, there is some basic knowledge for parameters and hyperparameters, and a review of usual methods to optimize hyperparameters. In the rest of the story, we built a LeNet-5 based cat-dog classifier and scanned all hyperparameter combinations of interest. by observing the metric of validation, we can know which hyperparameter has the most influence and which combination gives the best result🏁.

The code is available in my GitHub😬

https://github.com/Kopfgeldjaeger/Medium_blogs_code/tree/master/2_talos_grid_search

8. References

Liashchynskyi, P., & Liashchynskyi, P. (2019). Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS. arXiv preprint arXiv:1912.06059

Van Rijn, J. N., & Hutter, F. (2018, July). Hyperparameter importance across datasets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2367–2376).

Hutter, F., Lücke, J., & Schmidt-Thieme, L. (2015). Beyond manual tuning of hyperparameters. KI-Künstliche Intelligenz, 29(4), 329–337.

--

--