The world’s leading publication for data science, AI, and ML professionals.

Operator Learning via Physics-Informed DeepONet: Let’s Implement It From Scratch

A deep dive into the DeepONets, physics-informed neural networks, and physics-informed DeepONets

Figure 1. ODE/PDEs are widely used to describe the system processes. In many scenarios, those ODE/PDEs accept a function (e.g., the forcing function u(t)) as input and output another function (e.g., s(t)). Traditionally, numerical solvers are used to connect the input and output. More recently, neural operators are developed to address the same problem but with much higher efficiency. (Image by Author)
Figure 1. ODE/PDEs are widely used to describe the system processes. In many scenarios, those ODE/PDEs accept a function (e.g., the forcing function u(t)) as input and output another function (e.g., s(t)). Traditionally, numerical solvers are used to connect the input and output. More recently, neural operators are developed to address the same problem but with much higher efficiency. (Image by Author)

Ordinary and partial differential equations (ODEs/PDEs) are the backbone of many disciplines in science and engineering, from physics and biology to economics and climate science. They are fundamental tools used to describe physical systems and processes, capturing the continuous change of quantities over time and space.

Yet, a unique trait of many of these equations is that they don’t just take single values as inputs, they take functions. For example, consider the case of predicting vibrations in a building due to an earthquake. The shaking of the ground, which varies over time, can be represented as a function that acts as the input to the differential equation describing the building’s motion. Similarly, in the case of sound waves propagating in a concert hall, the sound waves produced by a musical instrument can be an input function with varying volume and pitch over time. These varying input functions fundamentally influence the resulting output functions – the building’s vibrations and the acoustic field in the hall, respectively.

Traditionally, these ODEs/PDEs are tackled using numerical solvers like finite difference or finite element methods. However, these methods come with a bottleneck: for every new input function, the solver must be run all over again. This process can be computationally intensive and slow, particularly for intricate systems or high-dimensional inputs.

To address this challenge, a novel framework was introduced by Lu et al. in 2019: the Deep Operator Network, or Deeponet. DeepONets aim to learn the operator that maps input functions to output functions, essentially learning to predict the output of these ODEs/PDEs for any given input function without having to re-run a numerical solver each time.

But DeepONets, though powerful, inherited the common problems faced by data-driven methods: How can we ensure that the predictions of the network are in line with the known laws of physics encapsulated in the governing equations?

Enter physics-informed learning.

Physics-informed learning is a rapidly evolving branch of Machine Learning that combines physical principles with data science to enhance the modeling and understanding of complex physical systems. It involves leveraging domain-specific knowledge and physical laws to guide the learning process and improve the accuracy, generalization, and interpretability of machine learning models.

Under this framework, in 2021, Wang et al. introduced a new variant of DeepONets: the Physics-Informed DeepONet. This innovative approach builds on the foundation of DeepONets by incorporating our understanding of physical laws into the learning process. We’re no longer just asking our model to learn from data; we’re guiding it with principles derived from centuries of scientific inquiry.

This seems to be quite a promising approach! But how should we implement it in practice? That’s precisely what we’re going to explore today 🤗

In this blog, we’ll discuss the theory behind Physics-Informed DeepONet, and walk through how to implement it from scratch. We’ll also put our developed model into action and demonstrate its power through a hands-on case study.

If you are also interested in using physics-informed DeepONet to solve inverse problems, feel free to check out my new blog here: Solving Inverse Problems With Physics-Informed DeepONet: A Practical Guide With Code Implementation

Let’s get started!

Table of Content · 1. Case study · 2. Physics-informed DeepONet2.1 DeepONet: An overview2.2 Physics-informed Neural Networks (PINNs)2.3 Physics-informed DeepONet · 3. Implementation of Physics-informed DeepONet3.1 Define the Architecture3.2 Define ODE loss3.3 Define gradient descent step · 4. Data Generation and Organization4.1 Generation of u(·) profiles4.2 Generation of Dataset4.3 Dataset Organization · 5. Training Physics-informed DeepONet · 6. Results Discussion · 7. Take-away · Reference


1. Case study

Let’s ground our discussion in a concrete example. In this blog, we will reproduce the first case study considered in Wang et al.‘s original paper, i.e., an initial value problem described by the following ordinary differential equation (ODE):

with an initial condition s(0) = 0.

In this equation, u(t) is the input function that varies over time, and s(t) is the state of the system at time t that we are interested in predicting. In a physical scenario, u(t) could represent a force applied to a system, and s(t) might represent the system’s response, like its displacement or velocity, depending on the context. Our goal here is to learn the mapping between the forcing term u(t) and the ODE solution s(t).

Traditional numerical methods such as Euler’s method or Runge-Kutta methods can solve this equation effectively. However, notice that the forcing term u(t) can take various profiles, as shown by the following figure:

Figure 2. Example profiles of u(t). (Image by author)
Figure 2. Example profiles of u(t). (Image by author)

Consequently, every time u(t) changes, we would need to re-run the entire simulation to get the corresponding s(t) (as shown in Figure 3), which can be computationally intensive and inefficient.

Figure 3. Corresponding profiles of s(t). They are calculated by using the RK45 algorithm to solve the ODE. (Image by author)
Figure 3. Corresponding profiles of s(t). They are calculated by using the RK45 algorithm to solve the ODE. (Image by author)

So, how can we address this type of problem more efficiently?


2. Physics-informed DeepONet

As mentioned in the introduction, the physics-informed DeepONet constitutes a promising solution to our target problem. In this section, we’ll break down its fundamental concepts to make them more comprehensible.

We’ll first discuss the principles underpinning the original DeepONet. Following that, we’ll explore the concept of physics-informed Neural Networks and how it brings an added dimension to the problem-solving table. Finally, we’ll demonstrate how we can seamlessly integrate these two ideas to construct the physics-informed DeepONets.

2.1 DeepONet: An overview

DeepONet, short for Deep Operator Network, represents a new frontier in deep learning. Unlike traditional machine learning methods that map a set of input values to output values, DeepONet is designed to map entire functions to other functions. This makes DeepONet particularly powerful when dealing with problems that naturally involve functional inputs and outputs. So how exactly it achieves that goal?

To formulate what we want to achieve symbolically:

Figure 4. Our goal is to train a neural network to approximate the operator that maps the forcing term u(·) to the target output s(·), which are both a function of t. (Image by author)
Figure 4. Our goal is to train a neural network to approximate the operator that maps the forcing term u(·) to the target output s(·), which are both a function of t. (Image by author)

On the left, we have the operator G that maps from an input function u(·) to an output function s(·). On the right, we would like to use a neural network to approximate the operator __ G. Once this can be achieved, we could use the trained neural network to perform a fast calculation of s(·) given any u(·).

For the current case study, both the input function u(·) and the output function s(·) take time coordinate t as the sole argument. Therefore, the "input" and "output" of the neural network we aim to build should look like this:

Figure 5. The input and output for the neural network model we aim to train. (Image by author)
Figure 5. The input and output for the neural network model we aim to train. (Image by author)

Essentially, our neural network should accept the entire profile of u(t) as the first input, as well as a specific time instance t as the second input. Subsequently, it should output the target output function s(·) evaluated at time instance t, i.e., s(t).

To better understand this setup, we recognize that the value of s(t) firstly depends on the profile of s(·), which in turn depends on u(·), and secondly depends on at which time instance the s(·) is evaluated. This is also why time coordinate t is necessary to be among the inputs.

There are two questions we need to clear at the moment: first of all, how should we input a continuous profile of u(·) to the network? And secondly, how should we concatenate the two inputs, i.e., t and u(·).

1️⃣ How should we input a continuous profile of u(·)?

Well, we don’t actually. A straightforward solution is to represent the function u(·) discretely. More specifically, we simply evaluate u(·) values at sufficient but finite many locations and subsequently feed those discrete u(·) values into the neural network:

Figure 6. The u(·) profile is discretized before being fed into the neural network. (Image by author)
Figure 6. The u(·) profile is discretized before being fed into the neural network. (Image by author)

Those locations are referred to as sensors in the original DeepONet paper.

2️⃣ How should we concatenate the input t and u(·)?

At first sight, we might want to concatenate them directly at the input layer. However, it turns out that this naive approach will not only put a constraint on what types of neural networks we can use, but also lead to suboptimal prediction accuracy in practice. There is a better way though. Time to introduce DeepONet.

In a nutshell, DeepONet proposed a new network architecture for performing operator learning: it consists of two main components: a branch network and a trunk network. The branch network takes the discrete function values as inputs and transforms them into a feature vector. Meanwhile, the trunk network takes the coordinate(s) (in our current case study, the coordinate is just t. For PDEs, it will include both temporal and spatial coordinates) and also converts it/them into a feature vector with the same dimensions. These two feature vectors are then merged by a dot product, and the end result is used as the prediction of s(·) evaluated at the input coordinate.

Figure 7. A DeepONet consists of a branch net to handle the input function u(·) and a trunk net to handle the temporal/spatial coordinates. The outputs of two nets have the same dimensions and are merged via a dot product. Optionally, a bias term can be added after the dot product to further improve the model's expressibility. (Image by author)
Figure 7. A DeepONet consists of a branch net to handle the input function u(·) and a trunk net to handle the temporal/spatial coordinates. The outputs of two nets have the same dimensions and are merged via a dot product. Optionally, a bias term can be added after the dot product to further improve the model’s expressibility. (Image by author)

In the original DeepONet paper, the authors stated that this "divide-and-conquer" strategy, exemplified in separate "branch" and "trunk" networks, is inspired by the universal approximation theorem for operator, and serves to introduce a strong inductive bias specifically for operator learning. This is also the key point that makes DeepONet an effective solution, as claimed by the authors.

If you are curious to learn more about theoritical basis of DeepONet, please refer to Appendix A in the original paper.

One of the main strengths of DeepONet is its efficiency. Once trained, a DeepONet can infer the output function for a new input function in real-time, without the need for further training, as long as the new input function is within the range of input functions it was trained on. This makes DeepONet a powerful tool in applications that require real-time inference.

Another notable strength of DeepONet lies in its flexibility and versatility. While the most common choice for the trunk and branch networks might be fully-connected layers, the DeepONet framework permits a high level of architecture customization. Depending on the characteristics of the input function u(·) and the coordinates, a variety of neural network architectures such as CNN, RNN, etc. can also be employed. This adaptability makes DeepONet a highly versatile tool.

However, despite these strengths, the limitations of DeepONet are also prominent: as a purely data-driven method, DeepONet cannot guarantee that its predictions will follow prior knowledge or governing equations that describe the physical system under consideration. Consequently, DeepONet may not generalize well, especially when faced with input functions that lie outside the distribution of its training data, referred to as out-of-distribution (OOD) inputs. A common remedy for that is simply preparing a large amount of data for training, which might not always be feasible in practice, especially in scientific and engineering fields where data collection can be expensive or time-consuming.

So how should we address these limitations? Time to talk about physics-informed learning, and more specifically, physics-informed neural networks (PINNs).

2.2 Physics-informed Neural Networks (PINNs)

In traditional machine learning models, we rely primarily on data to learn the underlying patterns. However, in many scientific and engineering fields, governing equations (ODE/PDEs) that captured our prior knowledge about the dynamical system are available, and they present another source of information we can leverage besides the observed data. This additional knowledge source, if incorporated correctly, could potentially improve the model’s performance and generalization ability, especially when dealing with limited or noisy data. This is where physics-informed learning comes into play.

When we merge the concept of physics-informed learning and neural networks, we would then arrive at physics-informed neural networks (PINNs).

PINNs are a type of neural network where the network is trained to not only fit the data but also respect the known physical laws described by differential equations. This is achieved by introducing a ODE/PDE loss, which measures the degree of violation of the governing differential equations. This way, we inject the physical laws into the network training process and make it physically informed.

Figure 8. The loss function of a physics-informed neural network includes a contribution term of PDE loss, which effectively measures if the predicted solution satisfies the governing differential equation. Note that the derivative of the output with respect to the inputs can be easily calculated thanks to automatic differentiation. (Image by author)
Figure 8. The loss function of a physics-informed neural network includes a contribution term of PDE loss, which effectively measures if the predicted solution satisfies the governing differential equation. Note that the derivative of the output with respect to the inputs can be easily calculated thanks to automatic differentiation. (Image by author)

Though PINNs have proven to be effective in many applications, they are not without limitations. PINNs are typically trained for specific input parameters (e.g., boundary and initial conditions, external forcing, etc.). Consequently, whenever the input parameters have changed, we would need to retrain the PINN. Therefore, they are not particularly effective for real-time inference under different operating conditions.

Still remember which method is specifically designed for handling varying input parameters? That’s right, it’s the DeepONet! Time to combine the idea of physical-informed learning with DeepONet.

2.3 Physics-informed DeepONet

The main idea behind the Physics-informed DeepONet is to combine the strengths of both DeepONets and PINNs. Just like a DeepONet, a Physics-informed DeepONet is capable of taking a function as an input and producing a function as an output. This makes it highly efficient for real-time inference of new input functions, without the need for retraining.

On the other hand, like a PINN, a Physics-informed DeepONet incorporates known physical laws into its learning process. These laws are introduced as additional constraints in the loss function during training. This approach allows the model to make physically consistent predictions, even when dealing with limited or noisy data.

How do we achieve this integration? Similar to the PINNs, we add an extra loss contribution to measure how well the predictions of the model adhere to the known differential equation. By optimizing this loss function, the model learns to make predictions that are both data-consistent (if measurement data is provided during training) and physics-consistent.

Figure 10. Physics-informed DeepONet uses DeepONet as the backbone architecture while leveraging the concept of physics-informed learning to train the model. This way, the trained physics-informed DeepONet is both data and physics-consistent. (Image by author)
Figure 10. Physics-informed DeepONet uses DeepONet as the backbone architecture while leveraging the concept of physics-informed learning to train the model. This way, the trained physics-informed DeepONet is both data and physics-consistent. (Image by author)

In summary, the Physics-informed DeepONet is a powerful tool that combines the best of both worlds: the efficiency of the DeepONet and the accuracy of physics-informed learning. It represents a promising approach to solving complex problems in fields where both real-time inference and physical consistency are crucial.

In the next section, we will start working on our case study and turn theory into actual code.


3. Implementation of Physics-informed DeepONet

In this section, we will walk through how to define a Physics-informed DeepONet model to address our target case study. We will implement it in TensorFlow. Let’s start with importing the necessary libraries:

import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras
tf.random.set_seed(42)

3.1 Define the Architecture

As discussed previously, physics-informed DeepONet shares the same architecture as the original DeepONet. The following function defines the architecture for DeepONet:

def create_model(mean, var, verbose=False):
    """Definition of a DeepONet with fully connected branch and trunk layers.

    Args:
    ----
    mean: dictionary, mean values of the inputs
    var: dictionary, variance values of the inputs
    verbose: boolean, indicate whether to show the model summary

    Outputs:
    --------
    model: the DeepONet model
    """

    # Branch net
    branch_input = tf.keras.Input(shape=(len(mean['forcing'])), name="forcing")
    branch = tf.keras.layers.Normalization(mean=mean['forcing'], variance=var['forcing'])(branch_input)
    for i in range(3):
        branch = tf.keras.layers.Dense(50, activation="tanh")(branch)

    # Trunk net
    trunk_input = tf.keras.Input(shape=(len(mean['time'])), name="time")
    trunk = tf.keras.layers.Normalization(mean=mean['time'], variance=var['time'])(trunk_input)   
    for i in range(3):
        trunk = tf.keras.layers.Dense(50, activation="tanh")(trunk)

    # Compute the dot product between branch and trunk net
    dot_product = tf.reduce_sum(tf.multiply(branch, trunk), axis=1, keepdims=True)

    # Add the bias
    output = BiasLayer()(dot_product)

    # Create the model
    model = tf.keras.models.Model(inputs=[branch_input, trunk_input], outputs=output)

    if verbose:
        model.summary()

    return model   

In the code above:

  1. We assume that both the trunk and branch networks are fully connected networks, with 3 hidden layers, each containing 50 neurons and with tanh activation function. This architecture is chosen based on preliminary tests and should serve as a good starting point for this problem. Nevertheless, it is straightforward to replace it with other architectures (e.g., CNN, RNN, etc.) and other layer hyperparameters.
  2. The outputs of trunk and branch networks are merged via a dot product. As suggested in the original DeepONet paper, we add a bias term to improve the prediction accuracy. The BiasLayer() is a custom-defined class to achieve that goal:
class BiasLayer(tf.keras.layers.Layer):
    def build(self, input_shape):
        self.bias = self.add_weight(shape=(1,),
                                    initializer=tf.keras.initializers.Zeros(),
                                    trainable=True)
    def call(self, inputs):
        return inputs + self.bias

3.2 Define ODE loss

Next, we define a function to compute the ODE loss. Recall that our target ODE is:

Therefore, we can define the function as follows:

@tf.function
def ODE_residual_calculator(t, u, u_t, model):
    """ODE residual calculation.

    Args:
    ----
    t: temporal coordinate
    u: input function evaluated at discrete temporal coordinates
    u_t: input function evaluated at t
    model: DeepONet model

    Outputs:
    --------
    ODE_residual: residual of the governing ODE
    """

    with tf.GradientTape() as tape:
        tape.watch(t)
        s = model({"forcing": u, "time": t})

    # Calculate gradients
    ds_dt = tape.gradient(s, t)

    # ODE residual
    ODE_residual = ds_dt - u_t

    return ODE_residual

In the code above:

  1. We used tf.GradientTape() to calculate the gradient of s(·) with respect to t. Note that in TensorFlow, tf.GradientTape() is used as a context manager, and any operations executed within the tape’s context will be recorded by the tape. Here, we explicitly watch the variable t. As a result, TensorFlow will automatically track all operations that involve t, which in this case, it’s a forward running of the DeepONet model. Afterward, we use tape’s gradient() method to calculate the gradient of s(·) with respect to t.
  2. We included an extra input argument u_t, which denotes the value of the input function u(·) evaluated at t. This constitutes the right-hand-side term of our target ODE, and it is needed for calculating the ODE residual loss.
  3. We used @tf.function decorator to convert the regular Python function we just defined into a TensorFlow graph. It is useful to do that as gradient calculation can be quite expensive and executing it in Graph mode can significantly accelerate the computations.

3.3 Define gradient descent step

Next, we define the function to compile the total loss function and calculate the gradients of total loss with respect to the network model parameters:

@tf.function
def train_step(X, X_init, IC_weight, ODE_weight, model):
    """Calculate gradients of the total loss with respect to network model parameters.

    Args:
    ----
    X: training dataset for evaluating ODE residuals
    X_init: training dataset for evaluating initial conditions
    IC_weight: weight for initial condition loss
    ODE_weight: weight for ODE loss
    model: DeepONet model

    Outputs:
    --------
    ODE_loss: calculated ODE loss
    IC_loss: calculated initial condition loss
    total_loss: weighted sum of ODE loss and initial condition loss
    gradients: gradients of the total loss with respect to network model parameters.
    """
    with tf.GradientTape() as tape:
        tape.watch(model.trainable_weights)

        # Initial condition prediction
        y_pred_IC = model({"forcing": X_init[:, 1:-1], "time": X_init[:, :1]})

        # Equation residual
        ODE_residual = ODE_residual_calculator(t=X[:, :1], u=X[:, 1:-1], u_t=X[:, -1:], model=model)

        # Calculate loss
        IC_loss = tf.reduce_mean(keras.losses.mean_squared_error(0, y_pred_IC))
        ODE_loss = tf.reduce_mean(tf.square(ODE_residual))

        # Total loss
        total_loss = IC_loss*IC_weight + ODE_loss*ODE_weight

    gradients = tape.gradient(total_loss, model.trainable_variables)

    return ODE_loss, IC_loss, total_loss, gradients

In the code above:

  1. We only consider two loss terms: the loss associated with the initial condition IC_loss, and the ODE residual loss ODE_loss. The IC_loss is calculated by comparing the model-predicted s(t=0) with the known initial value of 0, and the ODE_loss is calculated by calling our previously defined ODE_residual_calculator function. Data loss can also be calculated and added to the total loss if the measured s(t) values are available (not implemented in the code above).
  2. In general, the total loss is a weighted sum of IC_loss and ODE_loss, where the weights control how much emphasis or priority is given to that individual loss terms during the training process. In our case study, it is sufficient to simply set both IC_weight and ODE_weight as 1.
  3. Similar to how we compute the ODE_loss, we also adopted tf.GradientTape() as the context manager to calculate the gradients. However, here we calculate the gradients of the total loss with respect to the network model parameters, which are necessary to perform gradient descent.

Before we proceed, let’s quickly summarize what we have developed so far:

1️⃣ We can initialize a DeepONet model with create_model() function.

2️⃣ We can calculate ODE residuals to assess how well the model predictions stick to the governing ODE. This is achieved with ODE_residual_calculator function.

3️⃣ We can calculate the total loss as well as its gradients with respect to the network model parameters with train_step.

Now the preparation work is half done 🚀 In the next section, we will discuss data generation and data organization issues (the strange X[:, :1]in the code above will hopefully become clear then). After that, we can finally train the model and see how it performs.


4. Data Generation and Organization

In this section, we discuss the generation of synthetic data and how to organize it for training the Physics-informed DeepONet model.

4.1 Generation of u(·) profiles

The data used for training, validation, and testing will be synthetically generated. The rationale behind this approach is twofold: it’s not only convenient but also allows for full control over the data’s characteristics.

In the context of our case study, we will generate the input function u(·)using a zero-mean Gaussian Process, with a radial basis function (RBF) kernel.

A Gaussian Process is a powerful mathematical framework commonly used in machine learning to model functions. The RBF kernel is a popular choice for capturing the similarity between input points. By using the RBF kernel within the Gaussian Process, we ensure that the generated synthetic data exhibits a smooth and continuous pattern, which is often desirable in various applications. To learn more about Gaussian Process, feel free to checkout my previous blog.

In scikit-learn, this can be achieved in just a few lines of code:

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF

def create_samples(length_scale, sample_num):
    """Create synthetic data for u(·)

    Args:
    ----
    length_scale: float, length scale for RNF kernel
    sample_num: number of u(·) profiles to generate

    Outputs:
    --------
    u_sample: generated u(·) profiles
    """

    # Define kernel with given length scale
    kernel = RBF(length_scale)

    # Create Gaussian process regressor
    gp = GaussianProcessRegressor(kernel=kernel)

    # Collocation point locations
    X_sample = np.linspace(0, 1, 100).reshape(-1, 1) 

    # Create samples
    u_sample = np.zeros((sample_num, 100))
    for i in range(sample_num):
        # sampling from the prior directly
        n = np.random.randint(0, 10000)
        u_sample[i, :] = gp.sample_y(X_sample, random_state=n).flatten()  

    return u_sample

In the code above:

  1. We use length_scaleto control the shape of the generated function. For a RBF kernel, the Figure 11 shows the u(·) profile given different kernel length scales.
  2. Recall that we need to discretize u(·) before feeding it to the DeepONet. This is done by specifying aX_sample variable, which allocates 100 uniformly distributed points within our interested temporal domain.
  3. In scikit-learn, the GaussianProcessRegressor object exposes a sample_y method to allow drawing random samples from the Gaussian process with the length-scale-specified kernel. Note that we didn’t call .fit() before using the GaussianProcessRegressor object, which is unlike what we normally do with other scikit-learn regressors. This is intentional as we want GaussianProcessRegressor to use the exact length_scale we provided. If you call .fit() , the length_scalewill be optimized to another value to better fit whatever data is given.
  4. The output u_sample is a matrix with a dimension of sample_num * 100. Each row of u_samplerepresents one profile of u(·), which consists of 100 discrete values.
Figure 11. Synthetic u(·) profiles under different kernel length scales. (Image by author)
Figure 11. Synthetic u(·) profiles under different kernel length scales. (Image by author)

4.2 Generation of Dataset

Now we have generated the u(·) profiles, let’s focus on how to get the dataset organized such that it can be fed into the DeepONet model.

Recall that the DeepONet model we developed in the last section requires 3 inputs:

  1. the time coordinate t, which is a scalar between 0 and 1 (let’s not consider batch size for the moment);
  2. the profile of u(·), which is a vector that consists of u(·) values evaluated at pre-defined, fixed time coordinates between 0 and 1;
  3. the value of u(t), which is again a scalar. This u(t) value is used for calculating the ODE loss at time coordinate t.

Therefore, we can formulate a single sample like this:

(Image by author)
(Image by author)

Of course, for each u(·) profile (marked as green in the above illustration), we should consider multiple t‘s (and the corresponding u(t)’s) to evaluate the ODE loss to better impose the physical constraints. In theory, t can take any value within the considered temporal domain (i.e., between 0 and 1 for our case study). However, to simplify things, we will just consider t at the same temporal locations where u(·) profile is discretized. As a result, our updated dataset will be like this:

(Image by author)
(Image by author)

Note that the above discussion only considers a single u(·) profile. If we take into account all the u(·) profiles, our final dataset would look like this:

(Image by author)
(Image by author)

where N stands for the number of u(·) profiles. Now with that in mind, let’s see some code:

from tqdm import tqdm
from scipy.integrate import solve_ivp

def generate_dataset(N, length_scale, ODE_solve=False):
    """Generate dataset for Physics-informed DeepONet training.

    Args:
    ----
    N: int, number of u(·) profiles
    length_scale: float, length scale for RNF kernel
    ODE_solve: boolean, indicate whether to compute the corresponding s(·)

    Outputs:
    --------
    X: the dataset for t, u(·) profiles, and u(t)
    y: the dataset for the corresponding ODE solution s(·)
    """

    # Create random fields
    random_field = create_samples(length_scale, N)

    # Compile dataset
    X = np.zeros((N*100, 100+2))
    y = np.zeros((N*100, 1))

    for i in tqdm(range(N)):
        u = np.tile(random_field[i, :], (100, 1))
        t = np.linspace(0, 1, 100).reshape(-1, 1)

        # u(·) evaluated at t
        u_t = np.diag(u).reshape(-1, 1)

        # Update overall matrix
        X[i*100:(i+1)*100, :] = np.concatenate((t, u, u_t), axis=1)

        # Solve ODE
        if ODE_solve:
            sol = solve_ivp(lambda var_t, var_s: np.interp(var_t, t.flatten(), random_field[i, :]), 
                            t_span=[0, 1], y0=[0], t_eval=t.flatten(), method='RK45')
            y[i*100:(i+1)*100, :] = sol.y[0].reshape(-1, 1)

    return X, y

In the code above, we add one option of computing the corresponding s(·) for a given u(·) profile. Although we won’t use s(·) values in training, we would still need them for testing the model performance. The calculation of s(·) is achieved by using scipy.integrate.solve_ivp, which is an ODE solver from SciPy that is specifically designed to solve initial value problems.

Now we can generate the training, validation, and testing dataset. Note that for this case study, we will use a length scale of 0.4 to generate the u(·) profiles and train the physics-informed DeepONet.

# Create training dataset
N_train = 2000
length_scale_train = 0.4
X_train, y_train = generate_dataset(N_train, length_scale_train)

# Create validation dataset
N_val = 100
length_scale_test = 0.4
X_val, y_val = generate_dataset(N_val, length_scale_test)

# Create testing dataset
N_test = 100
length_scale_test = 0.4
X_test, y_test = generate_dataset(N_test, length_scale_test, ODE_solve=True)

4.3 Dataset Organization

Finally, we convert the NumPy array into the TensorFlow dataset objects to facilitate data ingestion.

# Determine batch size
ini_batch_size = int(2000/100)
col_batch_size = 2000

# Create dataset object (initial conditions)
X_train_ini = tf.convert_to_tensor(X_train[X_train[:, 0]==0], dtype=tf.float32)
ini_ds = tf.data.Dataset.from_tensor_slices((X_train_ini))
ini_ds = ini_ds.shuffle(5000).batch(ini_batch_size)

# Create dataset object (collocation points)
X_train = tf.convert_to_tensor(X_train, dtype=tf.float32)
train_ds = tf.data.Dataset.from_tensor_slices((X_train))
train_ds = train_ds.shuffle(100000).batch(col_batch_size)

# Scaling 
mean = {
    'forcing': np.mean(X_train[:, 1:-1], axis=0),
    'time': np.mean(X_train[:, :1], axis=0)
}

var = {
    'forcing': np.var(X_train[:, 1:-1], axis=0),
    'time': np.var(X_train[:, :1], axis=0)
}

In the code above, we create two distinct datasets: one for evaluating the ODE loss (train_ds), and the other for evaluating the initial condition loss (ini_ds). We have also pre-calculated the mean and variance values for t and u(·). Those values will be used to standardize the inputs.

That’s it for data generation and organization. Next up, we will kick off the model training and see how it performs.


5. Training Physics-informed DeepONet

As a first step, let’s create a custom class to track loss evolutions:

from collections import defaultdict

class LossTracking:

    def __init__(self):
        self.mean_total_loss = keras.metrics.Mean()
        self.mean_IC_loss = keras.metrics.Mean()
        self.mean_ODE_loss = keras.metrics.Mean()
        self.loss_history = defaultdict(list)

    def update(self, total_loss, IC_loss, ODE_loss):
        self.mean_total_loss(total_loss)
        self.mean_IC_loss(IC_loss)
        self.mean_ODE_loss(ODE_loss)

    def reset(self):
        self.mean_total_loss.reset_states()
        self.mean_IC_loss.reset_states()
        self.mean_ODE_loss.reset_states()

    def print(self):
        print(f"IC={self.mean_IC_loss.result().numpy():.4e}, 
              ODE={self.mean_ODE_loss.result().numpy():.4e}, 
              total_loss={self.mean_total_loss.result().numpy():.4e}")

    def history(self):
        self.loss_history['total_loss'].append(self.mean_total_loss.result().numpy())
        self.loss_history['IC_loss'].append(self.mean_IC_loss.result().numpy())
        self.loss_history['ODE_loss'].append(self.mean_ODE_loss.result().numpy())

Then, we define the main training/validation logic:

# Set up training configurations
n_epochs = 300
IC_weight= tf.constant(1.0, dtype=tf.float32)   
ODE_weight= tf.constant(1.0, dtype=tf.float32)
loss_tracker = LossTracking()
val_loss_hist = []

# Set up optimizer
optimizer = keras.optimizers.Adam(learning_rate=1e-3)

# Instantiate the PINN model
PI_DeepONet= create_model(mean, var)
PI_DeepONet.compile(optimizer=optimizer)

# Configure callbacks
_callbacks = [keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=30),
             tf.keras.callbacks.ModelCheckpoint('NN_model.h5', monitor='val_loss', save_best_only=True)]
callbacks = tf.keras.callbacks.CallbackList(
                _callbacks, add_history=False, model=PI_DeepONet)

# Start training process
for epoch in range(1, n_epochs + 1):  
    print(f"Epoch {epoch}:")

    for X_init, X in zip(ini_ds, train_ds):

        # Calculate gradients
        ODE_loss, IC_loss, total_loss, gradients = train_step(X, X_init, 
                                                            IC_weight, ODE_weight,
                                                            PI_DeepONet)
        # Gradient descent
        PI_DeepONet.optimizer.apply_gradients(zip(gradients, PI_DeepONet.trainable_variables))

        # Loss tracking
        loss_tracker.update(total_loss, IC_loss, ODE_loss)

    # Loss summary
    loss_tracker.history()
    loss_tracker.print()
    loss_tracker.reset()

    ####### Validation
    val_res = ODE_residual_calculator(X_val[:, :1], X_val[:, 1:-1], X_val[:, -1:], PI_DeepONet)
    val_ODE = tf.cast(tf.reduce_mean(tf.square(val_res)), tf.float32)

    X_val_ini = X_val[X_val[:, 0]==0]
    pred_ini_valid = PI_DeepONet.predict({"forcing": X_val_ini[:, 1:-1], "time": X_val_ini[:, :1]}, batch_size=12800)
    val_IC = tf.reduce_mean(keras.losses.mean_squared_error(0, pred_ini_valid))
    print(f"val_IC: {val_IC.numpy():.4e}, val_ODE: {val_ODE.numpy():.4e}, lr: {PI_DeepONet.optimizer.lr.numpy():.2e}")

    # Callback at the end of epoch
    callbacks.on_epoch_end(epoch, logs={'val_loss': val_IC+val_ODE})
    val_loss_hist.append(val_IC+val_ODE)

    # Re-shuffle dataset
    ini_ds = tf.data.Dataset.from_tensor_slices((X_train_ini))
    ini_ds = ini_ds.shuffle(5000).batch(ini_batch_size)

    train_ds = tf.data.Dataset.from_tensor_slices((X_train))
    train_ds = train_ds.shuffle(100000).batch(col_batch_size)

It’s a rather long chunk of code, but it should be self-explanatory as we have already covered all the important pieces.

To visualize the training performance, we can plot the loss convergence curves:

# History
fig, ax = plt.subplots(1, 3, figsize=(12, 4))
ax[0].plot(range(n_epochs), loss_tracker.loss_history['IC_loss'])
ax[1].plot(range(n_epochs), loss_tracker.loss_history['ODE_loss'])
ax[2].plot(range(n_epochs), val_loss_hist)
ax[0].set_title('IC Loss')
ax[1].set_title('ODE Loss')
ax[2].set_title('Val Loss')
for axs in ax:
    axs.set_yscale('log')

The training results look like this:

Figure 12. Loss convergence plot. (Image by author)
Figure 12. Loss convergence plot. (Image by author)

In addition, we can also see how the prediction accuracy for one specific target s(·) evolves during the training:

At the beginning of the training, we can see a visible discrepancy between the model prediction and ground truth. However, toward the end of the training, the predicted s(·) converged to the ground truth. This indicates that our physics-informed DeepONet learns properly.


6. Results Discussion

Once the training is completed, we can reload the saved weights and assess the performance.

Here we randomly picked three u(·) profiles from the testing dataset and compare the corresponding s(·) predicted by our physics-informed DeepONet as well as calculated by the numerical ODE solver. We can see that the predictions and ground truth are almost indistinguishable.

Figure 13. Three u(·) profiles are randomly selected from the testing dataset, which are shown on the upper row. The lower row displays the corresponding s(·) profiles. We can see that the results predicted by the physics-informed DeepONet are indistinguishable from the ground truth, which is calculated by numerical ODE solvers. (Image by author)
Figure 13. Three u(·) profiles are randomly selected from the testing dataset, which are shown on the upper row. The lower row displays the corresponding s(·) profiles. We can see that the results predicted by the physics-informed DeepONet are indistinguishable from the ground truth, which is calculated by numerical ODE solvers. (Image by author)

These results are quite incredible, considering the fact that we didn’t even use any observational data of s(·) (except the initial condition) to train the DeepONet. This shows that the governing ODE itself has provided sufficient "supervision" signal for the model to make accurate predictions.

Another interesting thing to assess is the so-called "out-of-distribution" prediction capability. Since we enforced the governing equation when training the DeepONet, we can expect the trained physics-informed DeepONet to still be able to make decent predictions when the u(·) profiles lie outside the distribution of the training u(·)’s.

To test that, we can generate u(·) profiles using a different length scale. The following results showed three u(·) profiles generated with a length scale of 0.6, as well as the predicted s(·)’s. These results are quite nice, considering that the physics-informed DeepONet is trained with a length scale of 0.4.

Figure 14. The trained physics-informed DeepONet displayed a certain level of out-of-distribution prediction capability. (Image by author)
Figure 14. The trained physics-informed DeepONet displayed a certain level of out-of-distribution prediction capability. (Image by author)

However, if we keep reducing the length scale to 0.2, we would notice that visible discrepancies start to appear. This indicates that there is a limit on the generalization capability of the trained physics-informed DeepONet.

Figure 15. There is a limit on how far can physics-informed DeepONet generalize. (Image by author)
Figure 15. There is a limit on how far can physics-informed DeepONet generalize. (Image by author)

Smaller length scales in general lead to more complex u(·) profiles, which would be quite different than the u(·) profiles used for training. This could explain why the trained model encountered challenges in making accurate predictions in smaller length-scale regions.

Figure 16. It's challenging for our trained model to generalize to smaller length-scale regions, as the u(·) profiles are more complex and distinct from the training data. (Image by author)
Figure 16. It’s challenging for our trained model to generalize to smaller length-scale regions, as the u(·) profiles are more complex and distinct from the training data. (Image by author)

Overall, we could say that the developed physics-informed DeepONet can properly learn the system dynamics and map from input function to output function given only the ODE constraints. In addition, physics-informed DeepONet displays a certain level of capability to handle "out-of-distribution" predictions, indicating that training the model to align with the governing ODE improves the model’s generalization capability.


7. Take-away

We’ve come a long way on our exploration of Physics-Informed DeepONet. From understanding the fundamental concepts of DeepONet and physics-informed learning, to seeing them in action through code implementation, we’ve covered a lot about this powerful method of solving differential equations.

Here are a few key take-aways:

1️⃣ DeepONet is a powerful framework to perform operator learning, thanks to its novel architecture of branch and trunk networks.

2️⃣ Physics-Informed Learning explicitly incorporates governing differential equations of the dynamical system into the learning process, thus possessing the potential of improving the model’s interpretability and generalization ability.

3️⃣ Physics-Informed DeepONet combines the strengths of DeepONet and physics-informed learning, and presents itself as a promising tool for learning functional mappings while adhering to the associated governing equations.

Hope you have enjoyed this deep dive into Physics-Informed DeepONet. Next up, we will shift our gears toward solving inverse problems with physics-informed DeepONet. Stay tuned!

If you find my content useful, you could buy me a coffee here 🤗 Thank you very much for your support!

You can find the companion notebook with full code here 💻

If you are also interested in using physics-informed DeepONet to solve inverse problems, feel free to check out my new blog here: Solving Inverse Problems With Physics-Informed DeepONet: A Practical Guide With Code Implementation

If you would like to keep up to date with the best practices of physics-informed learning, please take a look at the design pattern series I am currently running: Unraveling the Design Pattern of Physics-Informed Neural Networks

You can also subscribe to my newsletter or follow me on Medium.

Reference

[1] Lu et al., DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv, 2019.

[2] Wang et al., Learning the solution operator of parametric partial differential equations with physics-informed DeepOnets. arXiv, 2021.


Related Articles