How to rapidly test dozens of deep learning models in Python

Thomas Ciha
Towards Data Science
6 min readSep 22, 2018

--

Let’s develop a neural network assembly line that allows us to easily experiment with numerous model configurations.

Assembly Line of Neural Networks (Source: all_is_magic on Shutterstock & Author)

Optimizing machine learning (ML) models is not an exact science. The best model architecture, optimization algorithm, and hyperparameter settings depend on the data you’re working with. Thus, being able to quickly test several model configurations is imperative in maximizing productivity & driving progress in your ML project. In this article, we’ll create an easy-to-use interface which allows you to do this that resembles an assembly line for ML models.

Deep learning models are governed by a set of hyperparameters, and we can create functions that generalize to these hyperparameters and build ad hoc models. Here are the primary hyperparameters that govern neural networks:

  • Number of hidden layers
  • Number of neurons per layer
  • Activation function
  • Optimization algorithm
  • Learning rate
  • Regularization technique
  • Regularization hyperparameters

We can package these into a hash table:

model_info = {}
model_info['Hidden layers'] = [100] * 6
model_info['Input size'] = og_one_hot.shape[1] - 1
model_info['Activations'] = ['relu'] * 6
model_info['Optimization'] = 'adadelta'
model_info["Learning rate"] = .005
model_info["Batch size"] = 32
model_info["Preprocessing"] = 'Standard'
model_info["Lambda"] = 0
model_2['Regularization'] = 'l2'
model_2['Reg param'] = 0.0005

Before we begin experimenting with various model architectures, let’s visualize the data to see what we’re working with. Although standard scaling is the de facto preprocessing method, I visualized the data using a variety of preprocessing tactics. I used PCA and t-SNE to reduce the dimensionality of the data for each preprocessing method. Below are the data visualizations which appear to be the most separable:

Source: Author

We can then define a function that will construct & compile a neural network given a hyperparameter hash table:

We can quickly test a few baseline models now that we have a fast, flexible way of constructing and compiling neural networks. This allows us to draw quick inferences about what hyperparameters seem to be working best:

Using the function above, I discovered that deeper and wider architectures are necessary to obtain high performance on the data after evaluating over a dozen model architectures with 5-fold cross validation. This is most likely due to the non-linear structure of our data. The graphs below are illustrative of diminishing returns. Increasing the number of hidden units in each layer from 15 to 120 results in a notable performance improvement on the training data, but virtually no performance on the test data. This is a sign that the model is overfitting — the performance on the training set is not generalizing to the test data.

Aside: If you’re not familiar with k-fold cross validation, it’s a model evaluation technique that involves divvying up the data into K disjoint partitions. One of those partitions is utilized as the test set and the rest of them as the training set. We then iterate through each fold so that every partition has a turn being the test set. Performing k-fold cross validation allows us to obtain a robust assessment of the model’s performance.

Source: Author

K-fold cross validation is a robust method to assess a model’s performance, but it’s important to note that it is computationally expensive to obtain these results. We can split the data into a training and test set to draw faster heuristics while optimizing hyperparameters and save the models after each epoch so they are retrieviable subsequent to training. The Tensorboard callback can also be utilized to examine how the model was trained:

We can then obtain a more robust performance assessment once we have gained some insights as to what hyperparameter settings are working well.

Grid search is not the go-to method for hyperparameter optimization in industry. Rather, a method referred to as the coarse-to-fine approach is more frequently employed. In the coarse-to-fine method, we start with a broad range of hyperparameters, then hone in on the parameter settings that work best. We then randomly sample hyperparameter settings from the narrow range of values we want to experiment with. We can rapidly iterate over numerous model configurations now that we have a way of dynamically instantiating deep neural networks:

Aside: When calling the Tensorboard log directory from your terminal you CANNOT have spaces in the file path. On Windows, spaces in the log directory preclude Tensorboard from loading the data properly.

The code above will also save important metrics (e.g., the area under the ROC curve) for each model into a CSV file so we can easily compare and contrast what hyperparameters lead to variations in performance.

Once we have a better idea of what hyperparameter values work well, we can begin to optimize the model within this range of values. The following function generates a randomized neural network. We can then use this function to experiment with various randomized hyperparameter settings within the range of values we have narrowed down:

In this article, we learned how to quickly experiment with numerous model architectures and hyperparameter settings. If you liked the article or learned something new, please feel free to follow me on Medium or leave a clap. Thanks again!

Source code: here

Caveat:

I have discovered a bug in the code that may be caused by the Tensorboard Keras callback or the build_nn() function. It’s not a major issue, but I want to bring this to the surface with you. The problem is that multiple graphs are written to the Tensorboard log files when testing a list of neural nets. For instance, when we run the “Model experimentation” code above, we get this graph visualization for model_1:

Erroneous Graph (Source: Author)

As you can see, there are two distinct graphs here: one corresponding to model_0 (left) and one for model_1 (right). This doesn’t happen for model_0’s graph because it was trained prior model_1:

Visualizing model_0 Architecture via Tensorboard (Source: Author)

However, if we load model_1 subsequent to training, we can see the architecture is correct for the hash table we passed to build_nn():

print("Model 1")
saved_model = load_model(FILE_PATH)
print(saved_model.summary())
Model 1
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_8 (Dense) (None, 110) 10120
_________________________________________________________________
activation_8 (Activation) (None, 110) 0
_________________________________________________________________
dense_9 (Dense) (None, 110) 12210
_________________________________________________________________
activation_9 (Activation) (None, 110) 0
_________________________________________________________________
dense_10 (Dense) (None, 110) 12210
_________________________________________________________________
activation_10 (Activation) (None, 110) 0
_________________________________________________________________
dense_11 (Dense) (None, 1) 111
_________________________________________________________________
activation_11 (Activation) (None, 1) 0
=================================================================
Total params: 34,651
Trainable params: 34,651
Non-trainable params: 0
_________________________________________________________________

This suggests there are no underlying issues in using this code to obtain accurate assessments, but it disallows us from visualizing models in the pipeline. If you have any insights, suggestions or know why this is occurring, please feel free to make changes to the code or let me know. I would like to continue looking into this, but am unable to do so as a full-time student with three part-time jobs. Thanks again for reading!

--

--