The world’s leading publication for data science, AI, and ML professionals.

Primer on developing reproducible Neural Networks in Jupyter Notebook

It takes more than just setting the seed for iterative development

Photo by Jeroen den Otter on Unsplash
Photo by Jeroen den Otter on Unsplash

Exploring NASA’s turbofan dataset

In preparation of my next article, I was playing around with different preprocessing settings in a Jupyter Notebook, prior to training a Neural Network (NN). After trying a few settings, I decided to revert to a previous setting because it performed better. However, upon executing the cells, the results of the NN weren’t the same even though I had set the seed…

Thus, began my quest for getting reproducible results while developing NNs in Jupyter Notebook. Specifically for the scenario where you’re going back and forth between processing and training cells.

Reproducible and comparable results primer

When developing NN models, it’s important to compare model performance and verify the effectiveness of your attempts to improve the model. Compiling a NN generates/initializes random weights for each connection in the network. These weights are updated during training. However, the effect of random initialization is large enough to influence model performance to the tenth decimal point, hampering model comparison. Therefore, we need to ‘control’ this random initialization such that, when you go back and forth between iterations, pre-processing or feature engineering, the results remain comparable.

To achieve this, I’ve found I need to control this randomness on two levels:

  1. Between session reproducibility (always applies) – You need to set the seeds of various random sample generators to make sure the NNs you train in a notebook return the same results every time you start the notebook (either on a different day or after restarting the kernel). The native Python seed even has to be set before importing other packages [1,2 ]. This will make sure that the weights which are generated will be the same each time you execute the code, e.g. the first set of random generated values will always be the same, the second set of random generated values will always be the same, etc.
  2. Within session comparison (applies specifically to notebooks) – After compiling a NN (first draw of random weights), you might want to try different pre-processing methods. You have to save the weights after the NN is compiled and reload those weights before (re)training. Otherwise re-compiling your NN generates new weights based on the seed you’ve set (second draw of random weights). Compare it to a rigged probability without replacement [3]. Your first draw will always yield the same result, your second draw will also always yield the same result, but it’s not equal to the first draw. Therefore, you have to save and restore the weights of your first initialization. In addition, you also have to reset the optimizer to avoid to continue learning from its last state.

These two steps are crucial to get both comparable and reproducible results.

One final note: When setting your random seed, it could be the seed happens to generate terrible initial weights. It’s therefore advised to try a few random seeds to make sure you didn’t get the short end of the stick.

Let’s see what this looks like in action

Example implementation

For this example implementation I’ll use NASAs CMAPSS dataset on turbofan engine degradation. The goal is to predict the Remaining Useful Life.

First, you import the libraries and set the random seed for the python environment, its build-in random library, numpy and tensorflow.

Next, you read in the data.

Result of train.head()
Result of train.head()

I will take a shortcut and skip explaining a few data preparation steps. You can check out the full code through the link at the bottom. Let’s define a simple MLP.

After defining the model, we compile it. By compiling the model, weights are initialized according to the seed we’ve set in the beginning. It’s the first draw of our random weights and supports between session reproducibility. Additionally, the initialized weights are saved for later (re)use. Saving the weights is an important step to support within session comparison.

For the code block below, the most important line is line 1. The value of Alpha determines the strength of a smoothing filter.

Because I work in a notebook, I go back and forth between the data preparation cell and the model fitting cell. For that reason, I recompile the model before fitting to reset the optimizer, otherwise it continues learning from its last state. However, recompiling the model generates a new set of weights (2nd draw of random weights). Therefore, I reload the initial weights to get comparable and reproducible results.

I can now experiment with various filter strengths to get an understanding of its effect. Once I’ve overshoot on the value of Alpha, I can just go back to a previous value and be sure I get the same results.

alpha = 1.0, train_rmse = 18.00
alpha = 0.4, train_rmse = 17.43
alpha = 0.2, train_rmse = 17.82
alpha = 0.4, train_rmse = 17.43!

Unfortunately smoothing the data for FD001 makes the predictions worse on the test set, but that’s beside the point of this toy example. The point is: by implementing these two steps, you get within and between session reproducible results. You can go back and forth between preprocessing and training cells, trying different settings. When you want to use a previous setting you can be assured the results are the same. In addition, when you return the following day (or restart the kernel) you can be assured the results are still the same.

I hope this makes your life easier! For the full code, please check out my github page here. I would like to thank Maikel Grobbe and Jeffrey Luppes for their input and reviewing my article.


For readers who follow my series "Exploring NASA’s turbofan dataset", the initial NN developed here (without smoothing or feature selection) already has an test RMSE of 18.50 on the FD001 dataset. A whopping 42% improvement over the baseline model and a big improvement over the previous best, which was a Support Vector Regression with an RMSE of 20.54 (and an overall improvement of 35.7% on the baseline). In my next article, we’ll delve into FD002 in which the turbofan engines run on different operating conditions and the exponential smoothing does have a beneficial effect.


References: [1] https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development [2] https://stackoverflow.com/questions/32419510/how-to-get-reproducible-results-in-keras/59076062#59076062 [3] https://www.onlinemathlearning.com/probability-without-replacement.html


Related Articles