The world’s leading publication for data science, AI, and ML professionals.

Solving Inverse Problems With Physics-Informed DeepONet: A Practical Guide With Code Implementation

Two case studies with parameter estimation and input function calibration

Photo by 愚木混株 cdd20 on Unsplash
Photo by 愚木混株 cdd20 on Unsplash

In my previous blog, we delved into the concept of physics-informed DeepONet (PI-DeepONet) and explored why it is particularly suitable for operator learning, i.e., learning mappings from an input function to an output function. We also turned theory into code and implemented a PI-DeepONet that can accurately solve an ordinary differential equation (ODE) even with unseen input forcing profiles.

Figure 1. Operators transform one function into another, which is a concept frequently encountered in real-world dynamical systems. Operator learning essentially involves training a neural network model to approximate this underlying operator. A promising method to achieve that is DeepONet. (Image by author)
Figure 1. Operators transform one function into another, which is a concept frequently encountered in real-world dynamical systems. Operator learning essentially involves training a neural network model to approximate this underlying operator. A promising method to achieve that is DeepONet. (Image by author)

The ability to solve these forward problems with PI-DeepONet is certainly valuable. But is that all PI-DeepONet can do? Well, definitely not!

Another important problem category we frequently encountered in computational science and engineering is the so-called Inverse Problem. In essence, this type of problem reverses the flow of information from output to input: the input is unknown and the output is observable, and the task is to estimate the unknown input from the observed output.

Figure 2. In forward problems, the objective is to predict the outputs given the known inputs via the operator. In inverse problems, the process is reversed: known outputs are used to estimate the original, unknown inputs, often with only partial knowledge of the underlying operator. Both forward and inverse problems are commonly encountered in computational science and engineering. (Image by author)
Figure 2. In forward problems, the objective is to predict the outputs given the known inputs via the operator. In inverse problems, the process is reversed: known outputs are used to estimate the original, unknown inputs, often with only partial knowledge of the underlying operator. Both forward and inverse problems are commonly encountered in computational science and engineering. (Image by author)

As you might have guessed, PI-DeepONet can also be a super useful tool for tackling these types of problems. In this blog, we will take a close look at how that can be achieved. More concretely, we will address two case studies: one with parameter estimation, and the other one with input function calibrations.

This blog intends to be self-contained, with only a brief discussion on the basics of physics-informed (PI-) learning, DeepONet, as well as our main focus, PI-DeepONet. For a more comprehensive intro to those topics, feel free to check out my previous blog.

With that in mind, let’s get started!

Table of Content

· 1. PI-DeepONet: A refresher · 2. Problem Statements · 3. Problem 1: Parameter Estimation3.1 How it works3.2 Implementing a PI-DeepONet pipeline3.3 Results discussion · 4. Problem 2: Input Function Estimation4.1 Solution stratgies4.2 Optimization routine: TensorFlow 4.3 Optimization routine: L-BFGS · 5. Take-away · Reference


1. PI-DeepONet: A refresher

As its name implies, PI-DeepONet is the combination of two concepts: physics-informed learning, and DeepONet.

Physics-informed learning is a new paradigm of Machine Learning and gains particular traction in the domain of dynamical system modeling. Its key idea is to explicitly bake the governing differential equations directly into the machine learning model, often through the introduction of an additional loss term in the loss function that accounts for the residuals of the governing equations. The premise of this learning approach is that the model built this way will respect known physical laws and offer better generalizability, interpretability, and trustworthiness.

DeepONet, on the other hand, resides in the traditional pure data-driven modeling domain. However, what’s unique about it is that DeepONet is specifically designed for operator learning, i.e., learning the mapping from an input function to an output function. This situation is frequently encountered in many dynamical systems. For instance, in a simple mass-spring system, the time-varying driving force serves as an input function (of time), while the resultant mass displacement is the output function (of time as well).

DeepONet proposed a novel network architecture (as shown in Figure 3), where a branch net is used to transform the profile of the input function, and a trunk net is used to transform the temporal/spatial coordinates. The feature vectors output by these two nets are then merged via a dot product.

Figure 3. The architecture of DeepONet. The uniqueness of this method lies in its separation of branch and trunk nets to handle input function profiles and temporal/spatial coordinates, respectively. (Image by author)
Figure 3. The architecture of DeepONet. The uniqueness of this method lies in its separation of branch and trunk nets to handle input function profiles and temporal/spatial coordinates, respectively. (Image by author)

Now, if we layer the concept of physics-informed learning on top of the DeepONet, we obtain what is known as PI-DeepONet.

Figure 4. Compared to a DeepONet, a PI-DeepONet contains extra loss terms such as the ODE/PDE residual loss, as well as the initial condition loss (IC loss) and boundary condition loss (BC loss). The conventional data loss is optional for PI-DeepONet, as it can directly learn the operator of the underlying dynamical system solely from the associated governing equations. (Image by author)
Figure 4. Compared to a DeepONet, a PI-DeepONet contains extra loss terms such as the ODE/PDE residual loss, as well as the initial condition loss (IC loss) and boundary condition loss (BC loss). The conventional data loss is optional for PI-DeepONet, as it can directly learn the operator of the underlying dynamical system solely from the associated governing equations. (Image by author)

Once a PI-DeepONet is trained, it can predict the profile of the output function for a given new input function profile in real-time, while ensuring that the predictions align with the governing equations. As you can imagine, this makes PI-DeepONet a potentially very powerful tool for a diverse range of dynamic system modeling tasks.

However, in many other system modeling scenarios, we may also need to perform the exact opposite operation, i.e., we know the outputs and want to estimate the unknown inputs based on observed output and our prior knowledge of system dynamics. Generally speaking, this type of scenario falls in the scope of inverse modeling. A question that naturally arises here is: can we also use PI-DeepONet to address inverse estimation problems?

Before we get into that, let’s first more precisely formulate the problems we are aiming to solve.


2. Problem Statements

We will use the same ODE discussed in the previous blog as our base model. Previously, we investigated an initial value problem described by the following equation:

with an initial condition s(0) = 0. In the equation, u(t) is the input function that varies over time, and s(t) denotes the state of the system at time t. Our previous focus is on solving the forward problem, i.e., predicting s(·) given u(·). Now, we will shift our focus and consider solving two types of inverse problems:

1️⃣ Estimating unknown input parameters

Let’s start with a straightforward inverse problem. Imagine our governing ODE has now evolved to be like this:

initial condition s(0) = 0, a and b are unknowns.
initial condition s(0) = 0, a and b are unknowns.

with a and b being the two unknown parameters. Our objective here is to estimate the values of a and b, given the observed u(·) and s(·) profiles.

This type of problem falls in the scope of parameter estimation, where unknown parameters of the system need to be identified from the measured data. Typical examples of this type of problem include system identification for control engineering, material thermal coefficients estimation in computational heat transfer, etc.

In our current case study, we will assume the true values for a and b are both 0.5.

2️⃣ Estimating the entire input function profile

For the second case study, we ramp up the problem complexity: Suppose that we know perfectly about the ODE (i.e., we know the exact values of a and b). However, while we have observed the s(·) profile, we don’t yet know the u(·) profile that has generated this observed output function. Consequently, our objective here is to estimate the u(·) profile, given the observed s(·) profile and known ODE:

initial condition s(0) = 0, a=0.5, b=0.5.
initial condition s(0) = 0, a=0.5, b=0.5.

Since we now aim to recover an entire input function instead of a small set of unknown parameters, this case study will be much more challenging than the first one. Unfortunately, this type of problem is inherently ill-posed and requires strong regularization to help constrain the solution space. Nevertheless, they often arise in various fields, including environmental engineering (e.g., identifying the profile of pollutant sources), aerospace engineering (e.g., calibrating the applied loads on aircraft), and wind engineering (e.g., wind force estimation).

In the following two sections, we will address these two case studies one by one.


3. Problem 1: Parameter Estimation

In this section, we tackle the first case study: estimating the unknown parameters in our target ODE. We will start with a brief discussion on how the general physics-informed Neural Networks can be used to solve this type of problem, followed by implementing a PI-DeepONet-based pipeline for parameter estimation. Afterward, we will apply it to our case study and discuss the obtained results.

3.1 How it works

In the original paper on physics-informed neural networks (PINNs), Raissi and co-authors outlined the strategy of using PINNs to solve inverse problem calibration problems: in essence, we can simply set the unknown parameters (in our current case, parameters a and b) as trainable parameters in the neural network, and optimize those unknown parameters together with the weights and bias of the neural net to minimize the loss function.

Of course, the secret sauce lies in constructing the loss function: as a physics-informed learning approach, it not only contains a data mismatch term, which measures the discrepancy between the predicted output of the network and the observed data, but also a physics-informed regularization term, which calculates the residuals (i.e., the difference between the left and right-hand side of the differential equation) using the outputs of the neural network (and their derivatives) and the current estimates of the parameters a and b.

Now, when we perform this joint optimization, we’re effectively searching for a and b values that would lead to a network’s outputs that simultaneously fit the observed data and satisfy the governing differential equation. When the loss function reaches its minimum value (i.e., the training is converged), the final values of a and b we obtain are the ones that have achieved this balance, and they are thus constituting the estimates of the unknown parameters.

Figure 5. When using PI-DeepONet, the unknown parameters a and b are jointly optimized with the weights and biases of the DeepONet model. When the training is converged, the a and b values we end up with constitute their estimates. (Image by author)
Figure 5. When using PI-DeepONet, the unknown parameters a and b are jointly optimized with the weights and biases of the DeepONet model. When the training is converged, the a and b values we end up with constitute their estimates. (Image by author)

3.2 Implementing a PI-DeepONet pipeline

Enough about the theory, it’s time to see some code 💻

Let’s start with the definition of PI-DeepONet:

def create_model(mean, var, a_init=None, b_init=None, trainable=None, verbose=False):
    """Definition of a DeepONet with fully connected branch and trunk layers.

    Args:
    ----
    mean: dictionary, mean values of the inputs
    var: dictionary, variance values of the inputs
    a_init: float, initial value for parameter a
    b_init: float, initial value for parameter b
    trainable: boolean, indicate whether the parameters a and b will be updated during training
    verbose: boolean, indicate whether to show the model summary

    Outputs:
    --------
    model: the DeepONet model
    """

    # Branch net
    branch_input = tf.keras.Input(shape=(len(mean['forcing'])), name="forcing")
    branch = tf.keras.layers.Normalization(mean=mean['forcing'], variance=var['forcing'])(branch_input)
    for i in range(3):
        branch = tf.keras.layers.Dense(50, activation="tanh")(branch)

    # Trunk net
    trunk_input = tf.keras.Input(shape=(len(mean['time'])), name="time")
    trunk = tf.keras.layers.Normalization(mean=mean['time'], variance=var['time'])(trunk_input)   
    for i in range(3):
        trunk = tf.keras.layers.Dense(50, activation="tanh")(trunk)

    # Merge results 
    dot_product = tf.reduce_sum(tf.multiply(branch, trunk), axis=1, keepdims=True)

    # Add the bias
    dot_product_with_bias = BiasLayer()(dot_product)

    # Add a & b trainable parameters
    output = ParameterLayer(a_init, b_init, trainable)(dot_product_with_bias)

    # Create the model
    model = tf.keras.models.Model(inputs=[branch_input, trunk_input], outputs=output)

    if verbose:
        model.summary()

    return model   

In the code above:

  1. We start by defining the trunk and branch networks (which are all simple fully connected nets). The feature vectors produced by both nets are merged via a dot product.
  2. We proceed by adding a bias term on top of the dot product result, which is achieved by defining a custom BiasLayer(). This strategy could improve the prediction accuracy, as indicated in the original DeepONet paper.
  3. Here comes the main change that makes our model capable of solving parameter estimation problems: we add a and b to the collection of the neural network model parameters. This way, when we set the trainable to be true, a and b will be optimized together with the other usual weights and biases of the neural network. Technically, we achieve this goal by defining a custom layer:
class ParameterLayer(tf.keras.layers.Layer):

    def __init__(self, a, b, trainable=True):
        super(ParameterLayer, self).__init__()
        self._a = tf.convert_to_tensor(a, dtype=tf.float32)
        self._b = tf.convert_to_tensor(b, dtype=tf.float32)
        self.trainable = trainable

    def build(self, input_shape):
        self.a = self.add_weight("a", shape=(1,), 
                                 initializer=tf.keras.initializers.Constant(value=self._a),
                                 trainable=self.trainable)
        self.b = self.add_weight("b", shape=(1,), 
                                 initializer=tf.keras.initializers.Constant(value=self._b),
                                 trainable=self.trainable)

    def get_config(self):
        return super().get_config()

    @classmethod
    def from_config(cls, config):
        return cls(**config)

Note that this layer does nothing besides introducing the two parameters as the model attributes.

Next, we define the function to calculate ODE residuals, which will serve as the physics-informed loss term:

@tf.function
def ODE_residual_calculator(t, u, u_t, model):
    """ODE residual calculation.

    Args:
    ----
    t: temporal coordinate
    u: input function evaluated at discrete temporal coordinates
    u_t: input function evaluated at t
    model: DeepONet model

    Outputs:
    --------
    ODE_residual: residual of the governing ODE
    """

    with tf.GradientTape() as tape:
        tape.watch(t)
        s = model({"forcing": u, "time": t})

    # Calculate gradients
    ds_dt = tape.gradient(s, t)

    # ODE residual
    ODE_residual = ds_dt - model.layers[-1].a*u_t - model.layers[-1].b

    return ODE_residual

Note that we have used model.layers[-1] to retrieve the parameter layer we defined earlier. This would give us the a and b values to calculate the ODE residuals.

Next up, we define the logic for calculating the gradients of total loss with respect to the parameters (including both usual weights and biases, as well as the unknown parameters a and b). This will prepare us for performing the gradient descent for model training:

@tf.function
def train_step(X, y, X_init, IC_weight, ODE_weight, data_weight, model):
    """Calculate gradients of the total loss with respect to network model parameters.

    Args:
    ----
    X: training dataset for evaluating ODE residuals
    y: target value of the training dataset
    X_init: training dataset for evaluating initial conditions
    IC_weight: weight for initial condition loss
    ODE_weight: weight for ODE loss
    data_weight: weight for data loss
    model: DeepONet model

    Outputs:
    --------
    ODE_loss: calculated ODE loss
    IC_loss: calculated initial condition loss
    data_loss: calculated data loss
    total_loss: weighted sum of ODE loss, initial condition loss, and data loss
    gradients: gradients of the total loss with respect to network model parameters.
    """

    with tf.GradientTape() as tape:
        tape.watch(model.trainable_weights)

        # Initial condition prediction
        y_pred_IC = model({"forcing": X_init[:, 1:-1], "time": X_init[:, :1]})

        # Equation residual
        ODE_residual = ODE_residual_calculator(t=X[:, :1], u=X[:, 1:-1], u_t=X[:, -1:], model=model)

        # Data loss
        y_pred_data = model({"forcing": X[:, 1:-1], "time": X[:, :1]})

        # Calculate loss
        IC_loss = tf.reduce_mean(keras.losses.mean_squared_error(0, y_pred_IC))
        ODE_loss = tf.reduce_mean(tf.square(ODE_residual))
        data_loss = tf.reduce_mean(keras.losses.mean_squared_error(y, y_pred_data))

        # Weight loss
        total_loss = IC_loss*IC_weight + ODE_loss*ODE_weight + data_loss*data_weight

    gradients = tape.gradient(total_loss, model.trainable_variables)

    return ODE_loss, IC_loss, data_loss, total_loss, gradients

Note that our loss function consists of three terms: the initial condition loss, the ODE residuals, and the data loss. For forward problems, data loss is optional, as the model can directly learn the underlying operator solely based on the given differential equations. However, for inverse problems (as in the current case, parameter estimations), incorporating a data loss is essential to ensure we find the correct a and b values. For our current case, it is sufficient to simply set all weights, i.e.,IC_weight, ODE_weight, and data_weight as 1.

Now we are ready to define the main training logic:

# Set up training configurations
n_epochs = 100
IC_weight= tf.constant(1.0, dtype=tf.float32)   
ODE_weight= tf.constant(1.0, dtype=tf.float32)
data_weight= tf.constant(1.0, dtype=tf.float32)

# Initial value for unknown parameters
a, b = 1, 1

# Optimizer
optimizer = keras.optimizers.Adam(learning_rate=1e-3)

# Instantiate the PINN model
PI_DeepONet = create_model(mean, var, a_init=a, b_init=b, trainable=True)
PI_DeepONet.compile(optimizer=optimizer)

# Start training process
for epoch in range(1, n_epochs + 1):  
    print(f"Epoch {epoch}:")

    for (X_init, _), (X, y) in zip(ini_ds, train_ds):

        # Calculate gradients
        ODE_loss, IC_loss, data_loss, total_loss, gradients = train_step(X, y, X_init, 
                                                                        IC_weight, ODE_weight,
                                                                        data_weight, PI_DeepONet)
        # Gradient descent
        PI_DeepONet.optimizer.apply_gradients(zip(gradients, PI_DeepONet.trainable_variables))

    # Re-shuffle dataset
    ini_ds = tf.data.Dataset.from_tensor_slices((X_train_ini, y_train_ini))
    ini_ds = ini_ds.shuffle(5000).batch(ini_batch_size)

    train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
    train_ds = train_ds.shuffle(100000).batch(col_batch_size) 

Here we set the same initial value of 1 for both a and b. Recall that the true values of a and b are actually 0.5. In the following, let’s train the PI-DeepONet model and see if it can recover the true values of a and b.

Please note that the code we discussed are only key snippets extracted from the full training/validation logic. To see the complete code, check out the notebook.

3.3 Results discussion

To use PI-DeepONet to estimate unknown ODE parameters, we first need to generate the training dataset. In our current case, the data generation proceeds in two steps: firstly, we generate the u(·) profiles (i.e., the input function) using a zero-mean Gaussian Process, with a radial basis function (RBF) kernel. Afterward, we run scipy.integrate.solve_ivp on our target ODE (with both a and b set as 0.5) to calculate the corresponding s(·) profiles (i.e., the output function). The figure below shows three random samples used for training.

Figure 6. Three samples were randomly selected from the training dataset for illustration purposes. The upper row shows the u(·) profiles, while the lower row shows the corresponding s(·) profiles calculated by running an ODE solver. (Image by author)
Figure 6. Three samples were randomly selected from the training dataset for illustration purposes. The upper row shows the u(·) profiles, while the lower row shows the corresponding s(·) profiles calculated by running an ODE solver. (Image by author)

Once the training dataset is ready, we can kick off the PI-DeepONet training process. The following figure displays the loss evolution, which indicates that the model is properly trained and is able to fit both the data and ODE very well.

Figure 7. Loss convergence plot. (Image by author)
Figure 7. Loss convergence plot. (Image by author)

Training the PI-DeepONet is not the end. Instead, our ultimate goal is to estimate the values of a and b. The following figure depicts the evolution of the unknown parameters. We can clearly see that our PI-DeepONet is doing its job and has accurately estimated the true values of a and b.

Figure 8. Our unknown parameters a and b gradually moved away from the specified initial values and converged to their true values. This demonstrates that PI-DeepONet is capable of performing inverse parameter calibration for differential equations with functional inputs. (Image by author)
Figure 8. Our unknown parameters a and b gradually moved away from the specified initial values and converged to their true values. This demonstrates that PI-DeepONet is capable of performing inverse parameter calibration for differential equations with functional inputs. (Image by author)

We can further test the sensitivity of the estimation results with respect to the initial values. The following figure shows the evolution of the parameters under a different set of initial values (_ainit=0.2, _binit=0.8):

Figure 9. For a different set of initial values, our developed PI-DeepONet model is able to accurately estimate the true values of a and b. (Image by author)
Figure 9. For a different set of initial values, our developed PI-DeepONet model is able to accurately estimate the true values of a and b. (Image by author)

Overall, we can see that the PI-DeepONet model is capable of performing parameter calibrations when the input to the ODE is a function.


4. Problem 2: Input Function Estimation

Time to level up our game! In this section, we will tackle a more challenging case study: estimating the entire profile of the input function u(·). Similar to the previous section, we will start with a brief discussion of the solution strategies, followed by implementing the proposed strategies to address our target problem. We will look into two different strategies: one employing native TensorFlow and another using the L-BFGS optimization algorithm.

4.1 Solution stratgies

To devise our solution strategy, we need to consider two things:

1️⃣ Optimization efficiency

At its core, our target inverse problem requires optimizing the input profile u(·) such that the predicted s(·) matches with the observed s(·). Assuming that we can calculate s(·) given u(·), many off-the-shelf optimizers can be used to achieve our goal. Now the question is, how do we calculate s(·) given u(·)?

The traditional method would be to employ a standard numerical ODE solver to predict the output function s(·) given the input function u(·). However, optimization processes typically involve multiple iterations. As a result, this approach of using a numerical ODE solver to calculate s(·) would be too inefficient, as in every iteration of the optimization loop, the u(·) will be updated, and a new s(·) needs to be calculated. This is where PI-DeepONet comes into play.

As a first step, we need a fully trained PI-DeepONet. Since we have full knowledge of the target ODE (recall that in this case study, we assume we know the true values of a and b), we can easily train a PI-DeepONet solely based on ODE residual loss.

Once the PI-DeepONet is trained, we can then freeze its weights & biases, treat it as a fast "surrogate" model that can efficiently calculate s(·) given any u(·), and embed this trained PI-DeepONet into the optimization routine. Since the inference time of PI-DeepONet is negligible, the computational cost can be greatly reduced. Meanwhile, the trained PI-DeepONet model will be fully differentiable. This suggests that we are able to provide the gradients of the optimization objective function with respect to the u(·), thus harnessing the efficiency of gradient-based optimization algorithms. Both of those aspects make iterative optimization feasible.

2️⃣ Optimization quantity

When we talk about estimating the profile of u(·), we are not attempting to estimate the symbolic functional form of u(·). Instead, we are estimating discrete u(·) values evaluated at a fixed set of time coordinates _t_₁, _t_₂, etc. This also aligns with how we feed u(·) into the PI-DeepONet.

Generally, we would need to have a high number of discrete u(·) values to sufficiently describe the profile of u(·). Consequently, our optimization problem would be a high-dimensional one, as there are many parameters to optimize. This type of problem is inherently ill-posed: simply finding a u(·) that can generate a matching s(·) most likely would not be a very strong constraint, as there could easily be many u(·)’s that can generate the exact same s(·) profile. We need a stronger regularization to help constrain the solution space.

So what should we do? Well, we can leverage the constraint brought by the known ODE. Simply put, our goal would be to find a u(·), that not only generates the s(·) that matches with the observed s(·), but also satisfy the known ODE that dictates the relationship between u(·) and s(·).

Figure 10. Illustration of the optimization strategy. (Imag by author)
Figure 10. Illustration of the optimization strategy. (Imag by author)

Now with those points clarified, let’s proceed to code up the optimization routine.

4.2 Optimization routine: TensorFlow

As we have discussed previously, we start by training a PI-DeepONet that observes the known governing ODE. We can reuse the code developed for the first case study. The only changes we need to introduce are:

  1. Set a_init and b_init to 0.5, as these are the true values of a and b.
  2. Set the trainable flag of the ParameterLayer as False. This will prevent updating a and b values during backpropagation.
  3. Set data_weight to 0, as we don’t need the paired u(·)-s(·) for training. The PI-DeepONet can be trained solely based on ODE residual loss, as we have shown in the previous blog.

The following figure shows the training process: the loss values are converged and the PI-DeepONet has been properly trained.

Figure 11. Loss convergence of training PI-DeepONet without paired input-output data. (Image by author)
Figure 11. Loss convergence of training PI-DeepONet without paired input-output data. (Image by author)

Next, we define the objective function and its gradients with respect to u(·) for our optimization process:

@tf.function
def gradient_u(model, u, s_target):
    """ Calculate the gradients of the objective function with respect to u(·).

    Args:
    ----
    model: the trained PI-DeepONet model
    u: the current u()
    s_target: the observed s()

    Outputs:
    --------
    obj: objective function
    gradients: calculated gradients
    """

    with tf.GradientTape() as tape:
        tape.watch(u)

        # Initial condition loss
        y_pred_IC = model({"forcing": u, "time": 0})
        IC_loss = tf.reduce_mean(keras.losses.mean_squared_error(0, y_pred_IC))

        # ODE residual
        ODE_residual = ODE_residual_calculator(t=tf.reshape(tf.linspace(0.0, 1.0, 100), (-1, 1)), 
                                               u=tf.tile(tf.reshape(u, (1, -1)), [100, 1]), 
                                               u_t=tf.reshape(u, (-1, 1)), model=model)
        ODE_loss = tf.reduce_mean(tf.square(ODE_residual))

        # Predict s() with the current u()
        y_pred = model({"forcing": tf.tile(tf.reshape(u, (1, -1)), [100, 1]), 
                        "time": tf.reshape(tf.linspace(0.0, 1.0, 100), (-1, 1))})

        # Data loss
        data_loss = tf.reduce_mean(keras.losses.mean_squared_error(s_target, y_pred))

        # Objective function
        obj = IC_loss + ODE_loss + data_loss

    # Gradient descent
    gradients = tape.gradient(obj, u)

    return obj, gradients

As you can see, we mostly repurposed the train_step() function developed in the first case study to define the objective function for optimization. For gradient calculation, a major difference between the two functions is that, here, we calculate the gradients of the objective function (i.e., total loss) with respect to the input u(·), whereas train_step() calculates the gradients of the total loss with respect to the neural network model parameters. This makes sense as previously, we want to optimize the neural network parameters (i.e., weights & biases), while here we want to optimize the input u(·).

Also note that we discretize u(·) with 100 points equally spaced within the domain of [0, 1]. Therefore, the u(·) estimation task at our hand is essentially optimizing those 100 discrete u(·) values.

Next, we define the main logic for optimizing the u(·):

def optimize_u(model, initial_u, s_target, learning_rate=5e-3, opt_steps=10000):
    """ Determine u() that generates the observed s().

    Args:
    ----
    model: the trained PI-DeepONet model
    initial_u: initial guess for u()
    s_target: the observed s()
    learning_rate: learning rate for the optimizer
    opt_steps: number of optimization iterations

    Outputs:
    --------
    u: the optimized u()
    u_hist: intermedian u results
    objective_func_hist: value history of the objective function
    """

    # Optimizer
    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

    # Preparation
    u_var = tf.Variable(initial_u, dtype=tf.float32)
    objective_func_hist = []
    u_hist = []

    for step in range(opt_steps):

        # Calculate gradients
        obj, gradients = gradient_u(model, u_var, s_target)

        # Gradient descent
        optimizer.apply_gradients([(gradients, u_var)])

        # Record results
        objective_func_hist.append(obj.numpy())
        u_hist.append(u_var.numpy())

        # Show progress
        if step % 500 == 0:
            print(f'Step {step} ==> Objective function = {obj.numpy()}')

    # Optimized u()
    u = u_var.numpy()

    return u, objective_func_hist, u_hist

Afterward, we can kick off the optimization process. For this case study, we randomly generate one u(·) profile and use the numerical solver to calculate its corresponding s(·) profile. Then, we assign the observed s(·) to s_target and test if our optimization algorithm can indeed identify the u(·) we used to generate the s(·).

For initialization, we simply use an all-zero profile for u(·) and assigned it to initial_u. The following figure shows the convergence of the objective function.

Figure 12. We used Adam optimizer with a fixed learning rate (5e-3) to optimize u(·). (Image by author)
Figure 12. We used Adam optimizer with a fixed learning rate (5e-3) to optimize u(·). (Image by author)

The following figure shows the evolution of the u(·) profile. We can see that as the number of iterations increased, the u(·) profile gradually converged to the true u(·).

Figure 13. During the inverse calibration process, u(·) gradually converged to the true u(·), indicating that the PI-DeepONet approach is able to successfully perform inverse calibration of the input function profile. (Image by author)
Figure 13. During the inverse calibration process, u(·) gradually converged to the true u(·), indicating that the PI-DeepONet approach is able to successfully perform inverse calibration of the input function profile. (Image by author)

We can then perform the usual forward simulation to obtain the predicted s(·) given the optimized u(·). The forward simulation was conducted via both the numerical ODE solver and the trained PI-DeepONet. We can see that the predicted s(·)’s follow closely to the ground truth.

Figure 14. Forward simulation to calculate the s(·) given the optimized u(·). The prediction results match very well with the ground truth. (Image by author)
Figure 14. Forward simulation to calculate the s(·) given the optimized u(·). The prediction results match very well with the ground truth. (Image by author)

We can further test if our optimization strategy also worked for other u(·) profiles. The following plots show six more random u(·) profiles we used in the testing:

Figure 15. For different u(·) profiles, we ran the same PI-DeepONet-based optimization routine with all-zero initialization. The results showed that our strategy has managed to deliver satisfactory results. (Image by author)
Figure 15. For different u(·) profiles, we ran the same PI-DeepONet-based optimization routine with all-zero initialization. The results showed that our strategy has managed to deliver satisfactory results. (Image by author)

Overall, the results showed that the PI-DeepONet approach can successfully perform inverse calibration for the entire input function profile.

4.3 Optimization routine: L-BFGS

The optimization routine we developed above was entirely operated in the TensorFlow environment. As a matter of fact, we can also leverage second-order optimization methods, such as L-BGFS implemented in SciPy, to achieve our goal. In this section, we will look into how to integrate our developed PI-DeepONet into the L-BGFS optimization workflow.

If you would like to learn more about L-BFGS approach, take a look at this blog post: Numerical optimization based on the L-BFGS method

To begin with, we define the objective function for the L-BFGS optimizer. The objective function is again a combination of data loss and ODE residual loss:

def obj_fn(u_var, model, s_target, convert_to_numpy=False):
    """ Calculate objective function.

    Args:
    ----
    u_var: u()
    model: the trained PI-DeepONet model
    s_target: the observed s()
    convert_to_numpy: boolean, indicate whether to convert the 
                      calculated objective value to numpy

    Outputs:
    --------
    obj: objective function
    """

    # Initial condition prediction
    y_pred_IC = model({"forcing": u_var, "time": 0})

    # ODE residual
    ODE_residual = ODE_residual_calculator(t=tf.reshape(tf.linspace(0.0, 1.0, 100), (-1, 1)), 
                                           u=tf.tile(tf.reshape(u_var, (1, -1)), [100, 1]), 
                                           u_t=tf.cast(tf.reshape(u_var, (-1, 1)), tf.float32), model=model)

    # Predict the output with the current inputs
    y_pred = model({"forcing": tf.tile(tf.reshape(u_var, (1, -1)), [100, 1]), 
                    "time": tf.reshape(tf.linspace(0.0, 1.0, 100), (-1, 1))})

    # Compute the data loss
    data_loss = tf.reduce_mean(keras.losses.mean_squared_error(y_target, y_pred))

    # Calculate other losses
    IC_loss = tf.reduce_mean(keras.losses.mean_squared_error(0, y_pred_IC))
    ODE_loss = tf.reduce_mean(tf.square(ODE_residual))

    # Obj function
    obj = IC_loss + ODE_loss + data_loss
    obj = obj if not convert_to_numpy else obj.numpy().astype(np.float64)

    return obj

Additionally, we need to define two different instances of the obj_fn: one for returning the Numpy object, and one for returning the TensorFlow object.

def obj_fn_numpy(u_var, model, s_target):
    return obj_fn(u_var, model, s_target, convert_to_numpy=True)

def obj_fn_tensor(u_var, model, s_target):
    return obj_fn(u_var, model, s_target, convert_to_numpy=False)

The obj_fn_numpy will be used by L-BFGS in SciPy to calculate the main objective function, while the obj_fn_tensor will be used by TensorFlow to calculate the gradients of the objective function with respect to u(·), as shown below:

def grad_fn(u, model, s_target):
    u_var = tf.Variable(u, dtype=tf.float32)

    with tf.GradientTape() as tape:
        tape.watch(u_var)
        obj = obj_fn_tensor(u_var, model, s_target)

    # Calculate gradients
    grads = tape.gradient(obj, u_var)

    return grads.numpy().astype(np.float64)

Now we are ready to define the main optimization logic:

from scipy.optimize import minimize

def optimize_u(model, initial_u, s_target, opt_steps=1000):
    """Optimize the inputs to a model to achieve a target output.

    Args:
    ----
    model: the trained PI-DeepONet model
    initial_u: initial guess for u()
    s_target: the observed s()
    opt_steps: number of optimization iterations

    Outputs:
    --------
    res.x: the optimized u
    """

    # L-BFGS optimizer 
    res = minimize(fun=obj_fn_numpy, x0=initial_u, args=(model, s_target),
                   method='L-BFGS-B', jac=grad_fn, options={'maxiter': opt_steps, 'disp': True})

    return res.x

To test the effectiveness of the L-BFGS algorithm, we adopted the same u(·) and s(·) investigated at the beginning of the previous subsection. We also set the initial u(·) to be an all-zero profile. After running 1000 L-BFGS iterations, we have obtained the following optimization results:

Figure 16. The calibrated u(·) and the predicted s(·). (Image by author)
Figure 16. The calibrated u(·) and the predicted s(·). (Image by author)

We can see that the L-BFGS algorithm also achieved desired results and successfully estimated the u(·) profile. We further inspect six more u(·) profiles and obtained similar results.

Figure 17. Calibration results with L-BFGS algorithm for different u(·) profiles. (Image by author)
Figure 17. Calibration results with L-BFGS algorithm for different u(·) profiles. (Image by author)

5. Take-away

In this blog, we investigated how to use PI-DeepONet, a novel network architecture that combines the strengths of physics-informed learning and operator learning, to solve inverse problems. We looked into two case studies:

1️⃣ Parameter estimation

Our goal is to estimate the unknown ODE parameters given the observed input function u(·) and output function s(·). Our strategy is to incorporate those unknown parameters as trainable parameters in the neural network, and optimize them jointly with the weights and biases of the neural network.

2️⃣ Input function calibration

Our goal is to estimate the entire u(·) profile given the perfectly known ODE and observed s(·). Our strategy is to first train a PI-DeepONet based on the known ODE to serve as a fast surrogate model, and then optimize the discretized u(·) values such that not only the generated s(·) match with the observed truth, but also fulfill the known ODE that connects u(·) and s(·).

For both case studies, we have seen satisfactory results produced by the PI-DeepONet approach, suggesting that this approach can be used to effectively address both forward and inverse problems, thus serving as a promising tool for dynamical system modeling.

Inverse problems pose unique challenges in computational science and engineering, and yet we only scratched the surface of it in this blog. For many real-life applications, we need to further consider other aspects:

1️⃣ Uncertainty quantification (UQ)

In this blog, we simply assumed the observed input and output function values are perfect. However, in practical scenarios, both the input and output data may be contaminated by noise. Understanding how this uncertainty propagates through our model is essential for making reliable estimations.

2️⃣ Modeling error

In this blog, we assumed our ODE perfectly describes the dynamics of the system. However, in practical scenarios, this assumption will most likely not hold: the differential equation might only be an approximation to the true underlying physical processes, and there are missing physics that are not captured by the equation. How to properly account for these induced modeling errors could be a challenging yet interesting aspect to look at.

3️⃣ Extending to PDE

In this blog, we demonstrated the workflow of solving inverse problems considering only a simple ODE. However, many practical systems are governed by PDEs with more spatial/temporal coordinates and complex input/output functional profiles. More challenges can be expected when we extend our developed methodology to PDEs (e.g., computational efficiency will likely be a major issue, not only in training the PI-DeepONet but also in performing optimization for inverse calibration).

Congrats on reaching this far! If you find my content useful, you could buy me a coffee here 🤗 Thank you very much for your support!

You can find the companion notebooks with full code here 💻

To learn more about PI-DeepONet for forward modeling: Operator Learning via Physics-Informed DeepONet

To learn the best practices of physics-informed learning: __ Unraveling the Design Pattern of Physics-Informed Neural Networks

You can also subscribe to my newsletter or follow me on Medium.

Reference

[1] Wang et al., Learning the solution operator of parametric partial differential equations with physics-informed DeepOnets. arXiv, 2021.


Related Articles