Surrogate Modeling

In part I of this series, we’ve introduced the fundamental concepts of surrogate modeling. In part II, we’ve seen surrogate modeling in action through a case study that presented the full analysis pipeline.
To recap, the surrogate modeling technique trains a cheap yet accurate statistical model to serve as the surrogate for the computationally expensive simulations, thus significantly improving the efficiency of the product design and analyses.
In part III, we will briefly discuss the following three trends emerged in surrogate modeling research and application:
- Gradient-enhanced surrogate modeling: incorporate the gradients at the training samples to improve model accuracy;
- Multi-fidelity surrogate modeling: assimilate training data with various fidelities to achieve higher training efficiency;
- Active learning: train surrogate models intelligently by actively select the next training data.
Table of Content · 1. Gradient-enhanced surrogate models ∘ 1.1 Basic idea ∘ 1.2 Example ∘ 1.3 Challenge · 2. Multi-fidelity surrogate models ∘ 2.1 Basic idea ∘ 2.2 Example · 3. Active learning ∘ 3.1 Basic idea ∘ 3.2 Expected prediction error (EPE) · 4. Key takeaways · Further reading · About the Author
1. Gradient-enhanced surrogate models
1.1 Basic idea
Gradients are defined as the sensitivity of the output with respect to the inputs. Thanks to rapid developments in techniques like adjoint method and automatic differentiation, it is now common for engineering simulation code to not only compute the output f(x) given the input vector x, but also compute the gradients ∂f(x)/∂x at the same time with negligible costs.
Consequently, we can expand our training data pairs (xᵢ, f(xᵢ)) to training data triples (xᵢ, f(xᵢ), ∂f(xᵢ)/∂xᵢ). By leveraging the additional gradient information, the trained surrogate model could reach a higher accuracy compared with the model trained only on (xᵢ, f(xᵢ)), given that both models use the same number of training data points.
We can also state the benefits of including the gradients in an equivalent way: it allows reducing the number of data points to achieve a given accuracy. This is a desired feature in practice. Recall that generating each training data point requires running the expensive simulation code one time. If we can cut down the total number of training data points, we can train the surrogate model with a smaller computational budget, therefore improving the training efficiency.
1.2 Example
Let’s use one example to see how a gradient-enhanced surrogate model can further boost the prediction accuracy. In this example, we use Gaussian Process and its gradient-enhanced version as the surrogate models to approximate the function illustrated in Fig. 1.
Both surrogate models use the same training samples. For gradient-enhanced Gaussian Process, the gradients of y with respect to x at those training samples are also provided for training.

From the predicting results displayed in Fig. 1, we can clearly see that a gradient-enhanced version of the surrogate model possesses a far better accuracy than its basic version. Especially in the region around x=0.8: even though no training samples are allocated in that area, the gradient-enhanced model still manages to correctly capture the trend.
1.3 Challenge
Data explosion constitutes a major issue hindering the implementation of gradient-enhanced surrogate modeling.
First of all, as the number of input parameters increases, the available pieces of information grow exponentially. For example, suppose there are 2 input parameters, and we use 10 training samples for the surrogate model training. In this situation, our total training data consists of 30 pieces of information:

Now, suppose we have to consider a total of 4 input parameters. As the number of inputs increases, we would need more samples for model training. Say we use 20 samples. Now, our total training data would consist of 100 pieces of information:

Therefore, the total training data grows very fast when the input parameters increase. Having abundant training data may not necessarily be a good thing, as it slows down the model tuning process (i.e., model hyperparameter optimizations). In extreme cases, train a surrogate model takes even more time than running the simulation.
Second, in theory, higher-order derivatives can also be incorporated in the surrogate model training. This also leads to data explosion: the number of derivatives used in model training grows exponentially as the derivative order goes higher. For example, given 2 input parameters _x_₁ and _x_₂, the first-order derivatives only contain 2 terms (i.e., ∂f/∂_x_₁ and ∂f/∂_x_₂), while the second-order derivatives contain extra 3 terms (i.e., ∂_f_²/∂²_x_₁, ∂_f_²/∂²_x_₂, and ∂_f_²/∂_x_₁∂_x_₂), therefore a total of 5 terms.
In facing the issue of training data explosion, we need to be more careful in deciding which derivatives of which samples go into the training dataset. Find the right amount of gradients such that the overall training effort can be reduced constitutes an active research area.
2. Multi-fidelity surrogate models
2.1 Basic idea
In many instances across computational engineering, multiple simulation codes with varying fidelities and evaluation costs are available for the same output.
High-fidelity simulations consider finer spatial/temporal resolution of the underlying physical process. Although the results are more align to reality, the computational cost is high as well. At the other end of the spectrum, we have low-fidelity simulations, which usually have coarser spatial/temporal resolutions and embed coarser physical details. However, they run much faster than their high-fidelity counterparts.

Naturally, we want our surrogate models to have the same fidelity as the high-fidelity simulations. However, generating samples purely from high-fidelity simulations is rather expensive. So how can we gain sufficient accuracy but without paying too much in the surrogate model training?
One way we can do is only generating a small number of high-fidelity samples, but at the same time generating a large number of low-fidelity samples (since they are cheap to generate). By aggregating samples from both fidelities, we may maximize the accuracy of the surrogate model while minimizing the associated training cost.
This is exactly what a multi-fidelity strategy is trying to achieve. More specifically, this strategy utilizes rich low-fidelity samples to explore the parameter space and obtain a qualitatively (but not yet quantitatively) correct description of the general trend of the approximated input-output relation. Meanwhile, this strategy leverages the available high-fidelity samples to effectively refine the low-fidelity results, thus ensuring the trained surrogate model’s quantitative correctness.
2.2 Example
Let’s see an example of using the multi-fidelity approach to achieve the target model accuracy with only a few high-fidelity training samples.
In this example, our low/high-fidelity training samples are shown in Fig. 3(a), along with the true function we want to approximate. We can see that the low-fidelity samples are not accurate as they are away from the true function curve. However, they match the overall trend of true function, which can be leveraged by the multi-fidelity approach to improve the model training efficiency.

In Fig. 3(b), we see that the number of high-fidelity training samples is far from enough, as the fitted surrogate model is unable to capture the characteristics of the underlying function. In Fig. 3(c), however, by augmenting the few high-fidelity samples with a large number of qualitatively correct low-fidelity samples, a multi-fidelity approach can yield a much better prediction, which aligns perfectly with the true function.
3. Active learning
3.1 Basic idea
Pay less Get more.
In building a surrogate model, we want to use as few training samples as possible to reach the target model prediction accuracy. Recall that generating training samples involves running expensive computer simulations. As a result, fewer training samples means higher efficiency in obtaining the surrogate model.
Previously, people tend to evenly distribute the training samples across the entire parameter space to guarantee the model accuracy. However, this practice may also induce significant waste of computational resources: the approximated input-output relation is, in general, not equally "complex" in different regions of parameter space, therefore, does not deserve the same amount of training data.
Instead, a smarter way would be to enrich the training dataset as the training progresses. In this way, the surrogate model gets to actively explore the landscape of the approximated input-output relation and add samples in regions where the model "believes" its predictions are inaccurate.
The learning function plays a key role in active learning, as it determines which sample to add to the existing training dataset. Crafting learning functions is an active research area. In general, learning functions differ from each other in terms of the goals they are pursuing.
In the following, we discuss one specific learning function, which aims to build a surrogate model that is accurate everywhere across the parameter space. This learning function is desired when the trained surrogate model is later used for performing parametric studies, sensitivity analysis, and visualization of the input-output relation.
3.2 Expected prediction error (EPE)
This learning function allocates the next training sample to the location where the surrogate model has the largest expected prediction error. This makes intuitive sense, as that’s how the surrogate model can learn the fastest.
In Machine Learning, expected prediction error can be written as the combination of a bias term and a variance term. This is the well-known bias-variance decomposition:

To implement this learning function, one requirement is that the employed surrogate model can estimate its prediction uncertainty (i.e., variance). One type of surrogate model that fulfills this requirement is Gaussian Process.
Obviously, we would not know the true function value f(x) in advance (otherwise, we won’t need to build a surrogate model to approximate it!). Therefore, the bias term in the above equation has to be estimated. One way to do that is through cross-validation. Detailed implementations are discussed by Liu et al. __ [1].
4. Key takeaways
In this blog, we’ve discussed some advanced concepts of surrogate Modeling:
- Gradient-enhanced Surrogate Modeling, which incorporates the gradients of the output against the inputs when training the model to boost the model predictive accuracy.
- Multi-fidelity surrogate modeling, which aggregates a few quantitatively correct high-fidelity training data with many qualitatively correct low-fidelity training data, to train a highly accurate surrogate model with minimum computational effort.
- Active learning, which encourages the surrogate model to explore the parameter space actively and adds training samples in regions where it can learn the most.
Further reading:
[1] H. Liu, J. Cai, and Y.-S. Ong. An adaptive sampling approach for Kriging metamodeling by maximizing expected prediction error. Computers & Chemical Engineering, 106(2): 171–182, Nov. 2017.
[2] Alexander I. J. Forrester, András Sóbester, Andy J. Keane, Engineering Design via Surrogate Modelling: A Practical Guide, 2008.
About the Author
I’m a Ph.D. researcher working on uncertainty quantification and reliability analysis for aerospace applications. Statistics and data science form the core of my daily work. I love sharing what I’ve learned in the fascinating world of statistics. Check my previous posts to find out more and connect with me on Medium and Linkedin.