The world’s leading publication for data science, AI, and ML professionals.

How to Train Your Neural Network?- A BayesOpt Based autoML Approach

In recent years, there has been a surge of interest in using machine learning for modeling physics applications. Our research group at…

In recent years, there has been a surge of interest in using machine learning for modeling physics applications. Our research group at UMass Amherst has been working towards applying these data-driven methods for problems in turbulence closure, coarse-graining, combustion, and building cheap-to-investigate data-driven emulators for complex physical processes. In this blog, we will introduce the current challenges in applying mainstream machine learning algorithms to domains specific applications and will showcase how we deal with this challenge in our scientific application of interest.


Let’s admit it, robustly training a neural network is hard! Especially the intuition behind identifying the best network architecture and hyperparameters, apriori for a given dataset makes it all the more challenging!

Current Challenges: The ML algorithms and the suggested best practices in designing the neural network and choosing hyperparameters therein, have been developed for applications such as computer vision, natural language processing among others. The loss manifold of these datasets is by nature vastly different than the one observed in the scientific datasets [ refer Figure 1 for an example, the loss generated using a function of two variables, obtained by translating and scaling Gaussian distributions, a common distribution found in science and engineering]. Therefore, oftentimes these best practices from literature translate poorly to applications within the scientific domain.

Some major challenges in doing so include that the data used to develop machine learning algorithms differ from scientific data in fundamental ways; as the scientific data is often high-dimensional, multimodal, complex, structured, and/or sparse [1,2]. The current pace of innovation in SciML is additionally driven by and limited to, the empirical exploration/experimentation with these canonical ML/DL techniques. In order to fully leverage the complicated nature of the data as well as develop optimized ML models, automatic machine learning (AutoML) methods for automating network design and choosing the best performing hyperparameters are critically required. In this work, we explore the effectiveness of Bayesian-based AutoML methods for complex physics emulators in an effort to build robust neural network models.

Fig 1. A non-convex loss manifold, showing multiple local regions of optimum making the neural network training process difficult
Fig 1. A non-convex loss manifold, showing multiple local regions of optimum making the neural network training process difficult

Our application: Our scientific application of interest is fluid turbulence, which is a non-linear, non-local, multi-scale phenomenon and just hard to resolve for engineering revelant flows such as in an Internal Combustion Engine [see Figure 2 for the typical range of scales in simulation]. While solving the full Navier Stokes using Direct Numerical Simulation (DNS) results in the most accurate representation of the complicated phenomenon, DNS is often computationally intractable. Popular reduced order models such as Reynolds Averaged Navier-Stokes (RANS) and Large Eddy Simulations (LES) while useful in the (comparatively) faster design space exploration, suffer from the difficulty of turbulence closure. In this study, we demonstrate the capability of DNNs to learn algebraic closure, as a function of filtered variables.

Fig 2. The multiscale nature of turbulence as seen from a canonical ICE simulation. Source [2]
Fig 2. The multiscale nature of turbulence as seen from a canonical ICE simulation. Source [2]

Proposed solution: In order to optimize the network architecture and identify the best hyperparameter settings for this task, we use a Bayesian Optimization-based autoML framework, in which the learning algorithm’s generalization performance is modeled as a sample from a Gaussian Process (GP) [For reference, Figure 3 lays out the entire workflow]. Expected Improvement is used as the acquisition function, to identify the next settings of parameters to investigate. Key learning parameters, including learning rate, drop factor, batch size, and network architecture, which have a leading-order effect on the convergence, are optimized. The Bayesian Optimization process is run sequentially on each set of parameters. Within each optimization run, the network is trained and performance against validation data is evaluated and recorded.

Fig 3. An end-to-end workflow used in the study. An autoML method based
Fig 3. An end-to-end workflow used in the study. An autoML method based

In order to optimize the network architecture, a block of network in Fully Connected Layer, and leakyRelu activation layer are used [Figure 4]. The depth of the network and the width (number of neurons in each layer) are further optimized using the Bayesian Optimization, during each evaluation. Optimizers (ADAM, RMSProp, SGDM) and weight initializaton strategies (Glorot, He) are optimized as well.

Fig 4. An end-to-end workflow used in the study. An autoML method based
Fig 4. An end-to-end workflow used in the study. An autoML method based

Results

We find that the ADAM Glorot and ADAM He combinations perform the best in terms of absolute errors, although the ADAM Glorot configuration has the highest number of parameters. The RMPSProp on average, performs better than SGDM optimizer. This can be explained as RMSProp is an adaptive learning rate method and is capable in navigating regions of local optima and whereas that SGDM performs poorly navigating ravines and makes hesitant progress towards local optima.

To better understand the learning process, we compute cosine similarity between weights checkpointed during training process after every epoch. This investigation [seen in Figure 5] reveals the self-similarity in the learning process each set of weight initialization-optimizers setting. However, when compared among different sets of optimizers, the differences in the learning process are elucidated clearly.

Fig 5. Evolution of the learning process can be visualized using the cosine similarity during the network training checkpointed at every epoch.
Fig 5. Evolution of the learning process can be visualized using the cosine similarity during the network training checkpointed at every epoch.

Since neural network weights are often multi-dimensional, to visualize the learning process, a dimensionality reduction technique such as t-SNE is useful in exploring the function-space. The effect of initialization is clearly evident from Figure 6, the Glorot and He initializations have overlapping function-space behavior which is expected due to the similarities in their mathematical formulation. Narrow-normal on the other hand is limited to only exploring a limited function-space. The optimizers show a wider variety in their function space exploration, and as seen previously the SGDM explores a limited set of the loss manifold whereas the other two optimizers explore a much wider range.

Fig 6. Low-dimensional representation of the function-space similarity for the weight initialization and optimizers reveal important takeaways
Fig 6. Low-dimensional representation of the function-space similarity for the weight initialization and optimizers reveal important takeaways

TL;DR

In summary, an AutoML in the training loop, provides a pathway to not only build robust neural networks suitable for applications to scientific datasets, but can be used to improve our intuition about the network training evolution!

To learn more about our work please follow the manuscript and the NeurIPS 2020 ML4Eng workshop spotlight talk here

This work will be featured at the Neural Information Processing Systems (NeurIPS) Machine Learning for Engineering Modeling, Simulation and Design Workshop 2020. Funding for this study came from the ICEnet Consortium, an industry-funded effort in building data-driven tools relevant to modeling Internal Combustion Engines. More details can be found at icenetcfd.com


References

[1]: Rick Stevens, Jeffrey Nichols, Katherine Yelick, Barbara Helland. "Report on the Department of Energy (DOE) Town Halls on Artificial Intelligence (AI) for Science". 2020. https://anl.app.box.com/s/f7m53y8beml6hs270h4yzh9l6cnmukph

[2]: Fagnan, Kjiersten, Nashed, Youssef, Perdue, Gabriel, Ratner, Daniel, Shankar, Arjun, and Yoo, Shinjae. "Data and Models: A Framework for Advancing AI in Science". United States: N. p., 2019.

https://www.osti.gov/biblio/1579323

[3]: Dias Ribeiro, Mateus, Alex Mendonça Bimbato, Maurício Araújo Zanardi, José Antônio Perrella Balestieri, and David P. Schmidt. "Large-eddy simulation of the flow in a direct injection spark ignition engine using an open-source framework." International Journal of Engine Research (2020): 1468087420903622

_Peetak Mitra is a final year Ph.D. candidate at UMass Amherst and can be found on twitter @peetak_mitra_


Related Articles