
Ordinary and partial differential equations (ODEs/PDEs) are the backbone of many disciplines in science and engineering, from physics and biology to economics and climate science. They are fundamental tools used to describe physical systems and processes, capturing the continuous change of quantities over time and space.
Yet, a unique trait of many of these equations is that they don’t just take single values as inputs, they take functions. For example, consider the case of predicting vibrations in a building due to an earthquake. The shaking of the ground, which varies over time, can be represented as a function that acts as the input to the differential equation describing the building’s motion. Similarly, in the case of sound waves propagating in a concert hall, the sound waves produced by a musical instrument can be an input function with varying volume and pitch over time. These varying input functions fundamentally influence the resulting output functions – the building’s vibrations and the acoustic field in the hall, respectively.
Traditionally, these ODEs/PDEs are tackled using numerical solvers like finite difference or finite element methods. However, these methods come with a bottleneck: for every new input function, the solver must be run all over again. This process can be computationally intensive and slow, particularly for intricate systems or high-dimensional inputs.
To address this challenge, a novel framework was introduced by Lu et al. in 2019: the Deep Operator Network, or Deeponet. DeepONets aim to learn the operator that maps input functions to output functions, essentially learning to predict the output of these ODEs/PDEs for any given input function without having to re-run a numerical solver each time.
But DeepONets, though powerful, inherited the common problems faced by data-driven methods: How can we ensure that the predictions of the network are in line with the known laws of physics encapsulated in the governing equations?
Enter physics-informed learning.
Physics-informed learning is a rapidly evolving branch of Machine Learning that combines physical principles with data science to enhance the modeling and understanding of complex physical systems. It involves leveraging domain-specific knowledge and physical laws to guide the learning process and improve the accuracy, generalization, and interpretability of machine learning models.
Under this framework, in 2021, Wang et al. introduced a new variant of DeepONets: the Physics-Informed DeepONet. This innovative approach builds on the foundation of DeepONets by incorporating our understanding of physical laws into the learning process. We’re no longer just asking our model to learn from data; we’re guiding it with principles derived from centuries of scientific inquiry.
This seems to be quite a promising approach! But how should we implement it in practice? That’s precisely what we’re going to explore today 🤗
In this blog, we’ll discuss the theory behind Physics-Informed DeepONet, and walk through how to implement it from scratch. We’ll also put our developed model into action and demonstrate its power through a hands-on case study.
If you are also interested in using physics-informed DeepONet to solve inverse problems, feel free to check out my new blog here: Solving Inverse Problems With Physics-Informed DeepONet: A Practical Guide With Code Implementation
Let’s get started!
Table of Content · 1. Case study · 2. Physics-informed DeepONet ∘ 2.1 DeepONet: An overview ∘ 2.2 Physics-informed Neural Networks (PINNs) ∘ 2.3 Physics-informed DeepONet · 3. Implementation of Physics-informed DeepONet ∘ 3.1 Define the Architecture ∘ 3.2 Define ODE loss ∘ 3.3 Define gradient descent step · 4. Data Generation and Organization ∘ 4.1 Generation of u(·) profiles ∘ 4.2 Generation of Dataset ∘ 4.3 Dataset Organization · 5. Training Physics-informed DeepONet · 6. Results Discussion · 7. Take-away · Reference
1. Case study
Let’s ground our discussion in a concrete example. In this blog, we will reproduce the first case study considered in Wang et al.‘s original paper, i.e., an initial value problem described by the following ordinary differential equation (ODE):

with an initial condition s(0) = 0.
In this equation, u(t) is the input function that varies over time, and s(t) is the state of the system at time t that we are interested in predicting. In a physical scenario, u(t) could represent a force applied to a system, and s(t) might represent the system’s response, like its displacement or velocity, depending on the context. Our goal here is to learn the mapping between the forcing term u(t) and the ODE solution s(t).
Traditional numerical methods such as Euler’s method or Runge-Kutta methods can solve this equation effectively. However, notice that the forcing term u(t) can take various profiles, as shown by the following figure:

Consequently, every time u(t) changes, we would need to re-run the entire simulation to get the corresponding s(t) (as shown in Figure 3), which can be computationally intensive and inefficient.

So, how can we address this type of problem more efficiently?
2. Physics-informed DeepONet
As mentioned in the introduction, the physics-informed DeepONet constitutes a promising solution to our target problem. In this section, we’ll break down its fundamental concepts to make them more comprehensible.
We’ll first discuss the principles underpinning the original DeepONet. Following that, we’ll explore the concept of physics-informed Neural Networks and how it brings an added dimension to the problem-solving table. Finally, we’ll demonstrate how we can seamlessly integrate these two ideas to construct the physics-informed DeepONets.
2.1 DeepONet: An overview
DeepONet, short for Deep Operator Network, represents a new frontier in deep learning. Unlike traditional machine learning methods that map a set of input values to output values, DeepONet is designed to map entire functions to other functions. This makes DeepONet particularly powerful when dealing with problems that naturally involve functional inputs and outputs. So how exactly it achieves that goal?
To formulate what we want to achieve symbolically:

On the left, we have the operator G that maps from an input function u(·) to an output function s(·). On the right, we would like to use a neural network to approximate the operator __ G. Once this can be achieved, we could use the trained neural network to perform a fast calculation of s(·) given any u(·).
For the current case study, both the input function u(·) and the output function s(·) take time coordinate t as the sole argument. Therefore, the "input" and "output" of the neural network we aim to build should look like this:

Essentially, our neural network should accept the entire profile of u(t) as the first input, as well as a specific time instance t as the second input. Subsequently, it should output the target output function s(·) evaluated at time instance t, i.e., s(t).
To better understand this setup, we recognize that the value of s(t) firstly depends on the profile of s(·), which in turn depends on u(·), and secondly depends on at which time instance the s(·) is evaluated. This is also why time coordinate t is necessary to be among the inputs.
There are two questions we need to clear at the moment: first of all, how should we input a continuous profile of u(·) to the network? And secondly, how should we concatenate the two inputs, i.e., t and u(·).
1️⃣ How should we input a continuous profile of u(·)?
Well, we don’t actually. A straightforward solution is to represent the function u(·) discretely. More specifically, we simply evaluate u(·) values at sufficient but finite many locations and subsequently feed those discrete u(·) values into the neural network:

Those locations are referred to as sensors in the original DeepONet paper.
2️⃣ How should we concatenate the input t and u(·)?
At first sight, we might want to concatenate them directly at the input layer. However, it turns out that this naive approach will not only put a constraint on what types of neural networks we can use, but also lead to suboptimal prediction accuracy in practice. There is a better way though. Time to introduce DeepONet.
In a nutshell, DeepONet proposed a new network architecture for performing operator learning: it consists of two main components: a branch network and a trunk network. The branch network takes the discrete function values as inputs and transforms them into a feature vector. Meanwhile, the trunk network takes the coordinate(s) (in our current case study, the coordinate is just t. For PDEs, it will include both temporal and spatial coordinates) and also converts it/them into a feature vector with the same dimensions. These two feature vectors are then merged by a dot product, and the end result is used as the prediction of s(·) evaluated at the input coordinate.

In the original DeepONet paper, the authors stated that this "divide-and-conquer" strategy, exemplified in separate "branch" and "trunk" networks, is inspired by the universal approximation theorem for operator, and serves to introduce a strong inductive bias specifically for operator learning. This is also the key point that makes DeepONet an effective solution, as claimed by the authors.
If you are curious to learn more about theoritical basis of DeepONet, please refer to Appendix A in the original paper.
One of the main strengths of DeepONet is its efficiency. Once trained, a DeepONet can infer the output function for a new input function in real-time, without the need for further training, as long as the new input function is within the range of input functions it was trained on. This makes DeepONet a powerful tool in applications that require real-time inference.
Another notable strength of DeepONet lies in its flexibility and versatility. While the most common choice for the trunk and branch networks might be fully-connected layers, the DeepONet framework permits a high level of architecture customization. Depending on the characteristics of the input function u(·) and the coordinates, a variety of neural network architectures such as CNN, RNN, etc. can also be employed. This adaptability makes DeepONet a highly versatile tool.
However, despite these strengths, the limitations of DeepONet are also prominent: as a purely data-driven method, DeepONet cannot guarantee that its predictions will follow prior knowledge or governing equations that describe the physical system under consideration. Consequently, DeepONet may not generalize well, especially when faced with input functions that lie outside the distribution of its training data, referred to as out-of-distribution (OOD) inputs. A common remedy for that is simply preparing a large amount of data for training, which might not always be feasible in practice, especially in scientific and engineering fields where data collection can be expensive or time-consuming.
So how should we address these limitations? Time to talk about physics-informed learning, and more specifically, physics-informed neural networks (PINNs).
2.2 Physics-informed Neural Networks (PINNs)
In traditional machine learning models, we rely primarily on data to learn the underlying patterns. However, in many scientific and engineering fields, governing equations (ODE/PDEs) that captured our prior knowledge about the dynamical system are available, and they present another source of information we can leverage besides the observed data. This additional knowledge source, if incorporated correctly, could potentially improve the model’s performance and generalization ability, especially when dealing with limited or noisy data. This is where physics-informed learning comes into play.
When we merge the concept of physics-informed learning and neural networks, we would then arrive at physics-informed neural networks (PINNs).
PINNs are a type of neural network where the network is trained to not only fit the data but also respect the known physical laws described by differential equations. This is achieved by introducing a ODE/PDE loss, which measures the degree of violation of the governing differential equations. This way, we inject the physical laws into the network training process and make it physically informed.

Though PINNs have proven to be effective in many applications, they are not without limitations. PINNs are typically trained for specific input parameters (e.g., boundary and initial conditions, external forcing, etc.). Consequently, whenever the input parameters have changed, we would need to retrain the PINN. Therefore, they are not particularly effective for real-time inference under different operating conditions.
Still remember which method is specifically designed for handling varying input parameters? That’s right, it’s the DeepONet! Time to combine the idea of physical-informed learning with DeepONet.
2.3 Physics-informed DeepONet
The main idea behind the Physics-informed DeepONet is to combine the strengths of both DeepONets and PINNs. Just like a DeepONet, a Physics-informed DeepONet is capable of taking a function as an input and producing a function as an output. This makes it highly efficient for real-time inference of new input functions, without the need for retraining.
On the other hand, like a PINN, a Physics-informed DeepONet incorporates known physical laws into its learning process. These laws are introduced as additional constraints in the loss function during training. This approach allows the model to make physically consistent predictions, even when dealing with limited or noisy data.
How do we achieve this integration? Similar to the PINNs, we add an extra loss contribution to measure how well the predictions of the model adhere to the known differential equation. By optimizing this loss function, the model learns to make predictions that are both data-consistent (if measurement data is provided during training) and physics-consistent.

In summary, the Physics-informed DeepONet is a powerful tool that combines the best of both worlds: the efficiency of the DeepONet and the accuracy of physics-informed learning. It represents a promising approach to solving complex problems in fields where both real-time inference and physical consistency are crucial.
In the next section, we will start working on our case study and turn theory into actual code.
3. Implementation of Physics-informed DeepONet
In this section, we will walk through how to define a Physics-informed DeepONet model to address our target case study. We will implement it in TensorFlow. Let’s start with importing the necessary libraries:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
tf.random.set_seed(42)
3.1 Define the Architecture
As discussed previously, physics-informed DeepONet shares the same architecture as the original DeepONet. The following function defines the architecture for DeepONet:
def create_model(mean, var, verbose=False):
"""Definition of a DeepONet with fully connected branch and trunk layers.
Args:
----
mean: dictionary, mean values of the inputs
var: dictionary, variance values of the inputs
verbose: boolean, indicate whether to show the model summary
Outputs:
--------
model: the DeepONet model
"""
# Branch net
branch_input = tf.keras.Input(shape=(len(mean['forcing'])), name="forcing")
branch = tf.keras.layers.Normalization(mean=mean['forcing'], variance=var['forcing'])(branch_input)
for i in range(3):
branch = tf.keras.layers.Dense(50, activation="tanh")(branch)
# Trunk net
trunk_input = tf.keras.Input(shape=(len(mean['time'])), name="time")
trunk = tf.keras.layers.Normalization(mean=mean['time'], variance=var['time'])(trunk_input)
for i in range(3):
trunk = tf.keras.layers.Dense(50, activation="tanh")(trunk)
# Compute the dot product between branch and trunk net
dot_product = tf.reduce_sum(tf.multiply(branch, trunk), axis=1, keepdims=True)
# Add the bias
output = BiasLayer()(dot_product)
# Create the model
model = tf.keras.models.Model(inputs=[branch_input, trunk_input], outputs=output)
if verbose:
model.summary()
return model
In the code above:
- We assume that both the trunk and branch networks are fully connected networks, with 3 hidden layers, each containing 50 neurons and with tanh activation function. This architecture is chosen based on preliminary tests and should serve as a good starting point for this problem. Nevertheless, it is straightforward to replace it with other architectures (e.g., CNN, RNN, etc.) and other layer hyperparameters.
- The outputs of trunk and branch networks are merged via a dot product. As suggested in the original DeepONet paper, we add a bias term to improve the prediction accuracy. The
BiasLayer()
is a custom-defined class to achieve that goal:
class BiasLayer(tf.keras.layers.Layer):
def build(self, input_shape):
self.bias = self.add_weight(shape=(1,),
initializer=tf.keras.initializers.Zeros(),
trainable=True)
def call(self, inputs):
return inputs + self.bias
3.2 Define ODE loss
Next, we define a function to compute the ODE loss. Recall that our target ODE is:

Therefore, we can define the function as follows:
@tf.function
def ODE_residual_calculator(t, u, u_t, model):
"""ODE residual calculation.
Args:
----
t: temporal coordinate
u: input function evaluated at discrete temporal coordinates
u_t: input function evaluated at t
model: DeepONet model
Outputs:
--------
ODE_residual: residual of the governing ODE
"""
with tf.GradientTape() as tape:
tape.watch(t)
s = model({"forcing": u, "time": t})
# Calculate gradients
ds_dt = tape.gradient(s, t)
# ODE residual
ODE_residual = ds_dt - u_t
return ODE_residual
In the code above:
- We used
tf.GradientTape()
to calculate the gradient of s(·) with respect to t. Note that in TensorFlow,tf.GradientTape()
is used as a context manager, and any operations executed within the tape’s context will be recorded by the tape. Here, we explicitly watch the variable t. As a result, TensorFlow will automatically track all operations that involve t, which in this case, it’s a forward running of the DeepONet model. Afterward, we use tape’sgradient()
method to calculate the gradient of s(·) with respect to t. - We included an extra input argument
u_t
, which denotes the value of the input function u(·) evaluated at t. This constitutes the right-hand-side term of our target ODE, and it is needed for calculating the ODE residual loss. - We used
@tf.function
decorator to convert the regular Python function we just defined into a TensorFlow graph. It is useful to do that as gradient calculation can be quite expensive and executing it in Graph mode can significantly accelerate the computations.
3.3 Define gradient descent step
Next, we define the function to compile the total loss function and calculate the gradients of total loss with respect to the network model parameters:
@tf.function
def train_step(X, X_init, IC_weight, ODE_weight, model):
"""Calculate gradients of the total loss with respect to network model parameters.
Args:
----
X: training dataset for evaluating ODE residuals
X_init: training dataset for evaluating initial conditions
IC_weight: weight for initial condition loss
ODE_weight: weight for ODE loss
model: DeepONet model
Outputs:
--------
ODE_loss: calculated ODE loss
IC_loss: calculated initial condition loss
total_loss: weighted sum of ODE loss and initial condition loss
gradients: gradients of the total loss with respect to network model parameters.
"""
with tf.GradientTape() as tape:
tape.watch(model.trainable_weights)
# Initial condition prediction
y_pred_IC = model({"forcing": X_init[:, 1:-1], "time": X_init[:, :1]})
# Equation residual
ODE_residual = ODE_residual_calculator(t=X[:, :1], u=X[:, 1:-1], u_t=X[:, -1:], model=model)
# Calculate loss
IC_loss = tf.reduce_mean(keras.losses.mean_squared_error(0, y_pred_IC))
ODE_loss = tf.reduce_mean(tf.square(ODE_residual))
# Total loss
total_loss = IC_loss*IC_weight + ODE_loss*ODE_weight
gradients = tape.gradient(total_loss, model.trainable_variables)
return ODE_loss, IC_loss, total_loss, gradients
In the code above:
- We only consider two loss terms: the loss associated with the initial condition
IC_loss
, and the ODE residual lossODE_loss
. TheIC_loss
is calculated by comparing the model-predicted s(t=0) with the known initial value of 0, and theODE_loss
is calculated by calling our previously definedODE_residual_calculator
function. Data loss can also be calculated and added to the total loss if the measured s(t) values are available (not implemented in the code above). - In general, the total loss is a weighted sum of
IC_loss
andODE_loss
, where the weights control how much emphasis or priority is given to that individual loss terms during the training process. In our case study, it is sufficient to simply set bothIC_weight
andODE_weight
as 1. - Similar to how we compute the
ODE_loss
, we also adoptedtf.GradientTape()
as the context manager to calculate the gradients. However, here we calculate the gradients of the total loss with respect to the network model parameters, which are necessary to perform gradient descent.
Before we proceed, let’s quickly summarize what we have developed so far:
1️⃣ We can initialize a DeepONet model with create_model()
function.
2️⃣ We can calculate ODE residuals to assess how well the model predictions stick to the governing ODE. This is achieved with ODE_residual_calculator
function.
3️⃣ We can calculate the total loss as well as its gradients with respect to the network model parameters with train_step
.
Now the preparation work is half done 🚀 In the next section, we will discuss data generation and data organization issues (the strange X[:, :1]
in the code above will hopefully become clear then). After that, we can finally train the model and see how it performs.
4. Data Generation and Organization
In this section, we discuss the generation of synthetic data and how to organize it for training the Physics-informed DeepONet model.
4.1 Generation of u(·) profiles
The data used for training, validation, and testing will be synthetically generated. The rationale behind this approach is twofold: it’s not only convenient but also allows for full control over the data’s characteristics.
In the context of our case study, we will generate the input function u(·)
using a zero-mean Gaussian Process, with a radial basis function (RBF) kernel.
A Gaussian Process is a powerful mathematical framework commonly used in machine learning to model functions. The RBF kernel is a popular choice for capturing the similarity between input points. By using the RBF kernel within the Gaussian Process, we ensure that the generated synthetic data exhibits a smooth and continuous pattern, which is often desirable in various applications. To learn more about Gaussian Process, feel free to checkout my previous blog.
In scikit-learn, this can be achieved in just a few lines of code:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
def create_samples(length_scale, sample_num):
"""Create synthetic data for u(·)
Args:
----
length_scale: float, length scale for RNF kernel
sample_num: number of u(·) profiles to generate
Outputs:
--------
u_sample: generated u(·) profiles
"""
# Define kernel with given length scale
kernel = RBF(length_scale)
# Create Gaussian process regressor
gp = GaussianProcessRegressor(kernel=kernel)
# Collocation point locations
X_sample = np.linspace(0, 1, 100).reshape(-1, 1)
# Create samples
u_sample = np.zeros((sample_num, 100))
for i in range(sample_num):
# sampling from the prior directly
n = np.random.randint(0, 10000)
u_sample[i, :] = gp.sample_y(X_sample, random_state=n).flatten()
return u_sample
In the code above:
- We use
length_scale
to control the shape of the generated function. For a RBF kernel, the Figure 11 shows the u(·) profile given different kernel length scales. - Recall that we need to discretize u(·) before feeding it to the DeepONet. This is done by specifying a
X_sample
variable, which allocates 100 uniformly distributed points within our interested temporal domain. - In scikit-learn, the
GaussianProcessRegressor
object exposes asample_y
method to allow drawing random samples from the Gaussian process with the length-scale-specified kernel. Note that we didn’t call.fit()
before using theGaussianProcessRegressor
object, which is unlike what we normally do with other scikit-learn regressors. This is intentional as we wantGaussianProcessRegressor
to use the exactlength_scale
we provided. If you call.fit()
, thelength_scale
will be optimized to another value to better fit whatever data is given. - The output
u_sample
is a matrix with a dimension of sample_num * 100. Each row ofu_sample
represents one profile of u(·), which consists of 100 discrete values.

4.2 Generation of Dataset
Now we have generated the u(·) profiles, let’s focus on how to get the dataset organized such that it can be fed into the DeepONet model.
Recall that the DeepONet model we developed in the last section requires 3 inputs:
- the time coordinate t, which is a scalar between 0 and 1 (let’s not consider batch size for the moment);
- the profile of u(·), which is a vector that consists of u(·) values evaluated at pre-defined, fixed time coordinates between 0 and 1;
- the value of u(t), which is again a scalar. This u(t) value is used for calculating the ODE loss at time coordinate t.
Therefore, we can formulate a single sample like this:

Of course, for each u(·) profile (marked as green in the above illustration), we should consider multiple t‘s (and the corresponding u(t)’s) to evaluate the ODE loss to better impose the physical constraints. In theory, t can take any value within the considered temporal domain (i.e., between 0 and 1 for our case study). However, to simplify things, we will just consider t at the same temporal locations where u(·) profile is discretized. As a result, our updated dataset will be like this:

Note that the above discussion only considers a single u(·) profile. If we take into account all the u(·) profiles, our final dataset would look like this:

where N stands for the number of u(·) profiles. Now with that in mind, let’s see some code:
from tqdm import tqdm
from scipy.integrate import solve_ivp
def generate_dataset(N, length_scale, ODE_solve=False):
"""Generate dataset for Physics-informed DeepONet training.
Args:
----
N: int, number of u(·) profiles
length_scale: float, length scale for RNF kernel
ODE_solve: boolean, indicate whether to compute the corresponding s(·)
Outputs:
--------
X: the dataset for t, u(·) profiles, and u(t)
y: the dataset for the corresponding ODE solution s(·)
"""
# Create random fields
random_field = create_samples(length_scale, N)
# Compile dataset
X = np.zeros((N*100, 100+2))
y = np.zeros((N*100, 1))
for i in tqdm(range(N)):
u = np.tile(random_field[i, :], (100, 1))
t = np.linspace(0, 1, 100).reshape(-1, 1)
# u(·) evaluated at t
u_t = np.diag(u).reshape(-1, 1)
# Update overall matrix
X[i*100:(i+1)*100, :] = np.concatenate((t, u, u_t), axis=1)
# Solve ODE
if ODE_solve:
sol = solve_ivp(lambda var_t, var_s: np.interp(var_t, t.flatten(), random_field[i, :]),
t_span=[0, 1], y0=[0], t_eval=t.flatten(), method='RK45')
y[i*100:(i+1)*100, :] = sol.y[0].reshape(-1, 1)
return X, y
In the code above, we add one option of computing the corresponding s(·) for a given u(·) profile. Although we won’t use s(·) values in training, we would still need them for testing the model performance. The calculation of s(·) is achieved by using scipy.integrate.solve_ivp
, which is an ODE solver from SciPy that is specifically designed to solve initial value problems.
Now we can generate the training, validation, and testing dataset. Note that for this case study, we will use a length scale of 0.4 to generate the u(·) profiles and train the physics-informed DeepONet.
# Create training dataset
N_train = 2000
length_scale_train = 0.4
X_train, y_train = generate_dataset(N_train, length_scale_train)
# Create validation dataset
N_val = 100
length_scale_test = 0.4
X_val, y_val = generate_dataset(N_val, length_scale_test)
# Create testing dataset
N_test = 100
length_scale_test = 0.4
X_test, y_test = generate_dataset(N_test, length_scale_test, ODE_solve=True)
4.3 Dataset Organization
Finally, we convert the NumPy array into the TensorFlow dataset objects to facilitate data ingestion.
# Determine batch size
ini_batch_size = int(2000/100)
col_batch_size = 2000
# Create dataset object (initial conditions)
X_train_ini = tf.convert_to_tensor(X_train[X_train[:, 0]==0], dtype=tf.float32)
ini_ds = tf.data.Dataset.from_tensor_slices((X_train_ini))
ini_ds = ini_ds.shuffle(5000).batch(ini_batch_size)
# Create dataset object (collocation points)
X_train = tf.convert_to_tensor(X_train, dtype=tf.float32)
train_ds = tf.data.Dataset.from_tensor_slices((X_train))
train_ds = train_ds.shuffle(100000).batch(col_batch_size)
# Scaling
mean = {
'forcing': np.mean(X_train[:, 1:-1], axis=0),
'time': np.mean(X_train[:, :1], axis=0)
}
var = {
'forcing': np.var(X_train[:, 1:-1], axis=0),
'time': np.var(X_train[:, :1], axis=0)
}
In the code above, we create two distinct datasets: one for evaluating the ODE loss (train_ds
), and the other for evaluating the initial condition loss (ini_ds
). We have also pre-calculated the mean and variance values for t and u(·). Those values will be used to standardize the inputs.
That’s it for data generation and organization. Next up, we will kick off the model training and see how it performs.
5. Training Physics-informed DeepONet
As a first step, let’s create a custom class to track loss evolutions:
from collections import defaultdict
class LossTracking:
def __init__(self):
self.mean_total_loss = keras.metrics.Mean()
self.mean_IC_loss = keras.metrics.Mean()
self.mean_ODE_loss = keras.metrics.Mean()
self.loss_history = defaultdict(list)
def update(self, total_loss, IC_loss, ODE_loss):
self.mean_total_loss(total_loss)
self.mean_IC_loss(IC_loss)
self.mean_ODE_loss(ODE_loss)
def reset(self):
self.mean_total_loss.reset_states()
self.mean_IC_loss.reset_states()
self.mean_ODE_loss.reset_states()
def print(self):
print(f"IC={self.mean_IC_loss.result().numpy():.4e},
ODE={self.mean_ODE_loss.result().numpy():.4e},
total_loss={self.mean_total_loss.result().numpy():.4e}")
def history(self):
self.loss_history['total_loss'].append(self.mean_total_loss.result().numpy())
self.loss_history['IC_loss'].append(self.mean_IC_loss.result().numpy())
self.loss_history['ODE_loss'].append(self.mean_ODE_loss.result().numpy())
Then, we define the main training/validation logic:
# Set up training configurations
n_epochs = 300
IC_weight= tf.constant(1.0, dtype=tf.float32)
ODE_weight= tf.constant(1.0, dtype=tf.float32)
loss_tracker = LossTracking()
val_loss_hist = []
# Set up optimizer
optimizer = keras.optimizers.Adam(learning_rate=1e-3)
# Instantiate the PINN model
PI_DeepONet= create_model(mean, var)
PI_DeepONet.compile(optimizer=optimizer)
# Configure callbacks
_callbacks = [keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=30),
tf.keras.callbacks.ModelCheckpoint('NN_model.h5', monitor='val_loss', save_best_only=True)]
callbacks = tf.keras.callbacks.CallbackList(
_callbacks, add_history=False, model=PI_DeepONet)
# Start training process
for epoch in range(1, n_epochs + 1):
print(f"Epoch {epoch}:")
for X_init, X in zip(ini_ds, train_ds):
# Calculate gradients
ODE_loss, IC_loss, total_loss, gradients = train_step(X, X_init,
IC_weight, ODE_weight,
PI_DeepONet)
# Gradient descent
PI_DeepONet.optimizer.apply_gradients(zip(gradients, PI_DeepONet.trainable_variables))
# Loss tracking
loss_tracker.update(total_loss, IC_loss, ODE_loss)
# Loss summary
loss_tracker.history()
loss_tracker.print()
loss_tracker.reset()
####### Validation
val_res = ODE_residual_calculator(X_val[:, :1], X_val[:, 1:-1], X_val[:, -1:], PI_DeepONet)
val_ODE = tf.cast(tf.reduce_mean(tf.square(val_res)), tf.float32)
X_val_ini = X_val[X_val[:, 0]==0]
pred_ini_valid = PI_DeepONet.predict({"forcing": X_val_ini[:, 1:-1], "time": X_val_ini[:, :1]}, batch_size=12800)
val_IC = tf.reduce_mean(keras.losses.mean_squared_error(0, pred_ini_valid))
print(f"val_IC: {val_IC.numpy():.4e}, val_ODE: {val_ODE.numpy():.4e}, lr: {PI_DeepONet.optimizer.lr.numpy():.2e}")
# Callback at the end of epoch
callbacks.on_epoch_end(epoch, logs={'val_loss': val_IC+val_ODE})
val_loss_hist.append(val_IC+val_ODE)
# Re-shuffle dataset
ini_ds = tf.data.Dataset.from_tensor_slices((X_train_ini))
ini_ds = ini_ds.shuffle(5000).batch(ini_batch_size)
train_ds = tf.data.Dataset.from_tensor_slices((X_train))
train_ds = train_ds.shuffle(100000).batch(col_batch_size)
It’s a rather long chunk of code, but it should be self-explanatory as we have already covered all the important pieces.
To visualize the training performance, we can plot the loss convergence curves:
# History
fig, ax = plt.subplots(1, 3, figsize=(12, 4))
ax[0].plot(range(n_epochs), loss_tracker.loss_history['IC_loss'])
ax[1].plot(range(n_epochs), loss_tracker.loss_history['ODE_loss'])
ax[2].plot(range(n_epochs), val_loss_hist)
ax[0].set_title('IC Loss')
ax[1].set_title('ODE Loss')
ax[2].set_title('Val Loss')
for axs in ax:
axs.set_yscale('log')
The training results look like this:

In addition, we can also see how the prediction accuracy for one specific target s(·) evolves during the training:
At the beginning of the training, we can see a visible discrepancy between the model prediction and ground truth. However, toward the end of the training, the predicted s(·) converged to the ground truth. This indicates that our physics-informed DeepONet learns properly.
6. Results Discussion
Once the training is completed, we can reload the saved weights and assess the performance.
Here we randomly picked three u(·) profiles from the testing dataset and compare the corresponding s(·) predicted by our physics-informed DeepONet as well as calculated by the numerical ODE solver. We can see that the predictions and ground truth are almost indistinguishable.

These results are quite incredible, considering the fact that we didn’t even use any observational data of s(·) (except the initial condition) to train the DeepONet. This shows that the governing ODE itself has provided sufficient "supervision" signal for the model to make accurate predictions.
Another interesting thing to assess is the so-called "out-of-distribution" prediction capability. Since we enforced the governing equation when training the DeepONet, we can expect the trained physics-informed DeepONet to still be able to make decent predictions when the u(·) profiles lie outside the distribution of the training u(·)’s.
To test that, we can generate u(·) profiles using a different length scale. The following results showed three u(·) profiles generated with a length scale of 0.6, as well as the predicted s(·)’s. These results are quite nice, considering that the physics-informed DeepONet is trained with a length scale of 0.4.

However, if we keep reducing the length scale to 0.2, we would notice that visible discrepancies start to appear. This indicates that there is a limit on the generalization capability of the trained physics-informed DeepONet.

Smaller length scales in general lead to more complex u(·) profiles, which would be quite different than the u(·) profiles used for training. This could explain why the trained model encountered challenges in making accurate predictions in smaller length-scale regions.

Overall, we could say that the developed physics-informed DeepONet can properly learn the system dynamics and map from input function to output function given only the ODE constraints. In addition, physics-informed DeepONet displays a certain level of capability to handle "out-of-distribution" predictions, indicating that training the model to align with the governing ODE improves the model’s generalization capability.
7. Take-away
We’ve come a long way on our exploration of Physics-Informed DeepONet. From understanding the fundamental concepts of DeepONet and physics-informed learning, to seeing them in action through code implementation, we’ve covered a lot about this powerful method of solving differential equations.
Here are a few key take-aways:
1️⃣ DeepONet is a powerful framework to perform operator learning, thanks to its novel architecture of branch and trunk networks.
2️⃣ Physics-Informed Learning explicitly incorporates governing differential equations of the dynamical system into the learning process, thus possessing the potential of improving the model’s interpretability and generalization ability.
3️⃣ Physics-Informed DeepONet combines the strengths of DeepONet and physics-informed learning, and presents itself as a promising tool for learning functional mappings while adhering to the associated governing equations.
Hope you have enjoyed this deep dive into Physics-Informed DeepONet. Next up, we will shift our gears toward solving inverse problems with physics-informed DeepONet. Stay tuned!
If you find my content useful, you could buy me a coffee here 🤗 Thank you very much for your support!
You can find the companion notebook with full code here 💻
If you are also interested in using physics-informed DeepONet to solve inverse problems, feel free to check out my new blog here: Solving Inverse Problems With Physics-Informed DeepONet: A Practical Guide With Code Implementation
If you would like to keep up to date with the best practices of physics-informed learning, please take a look at the design pattern series I am currently running: Unraveling the Design Pattern of Physics-Informed Neural Networks
You can also subscribe to my newsletter or follow me on Medium.
Reference
[1] Lu et al., DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv, 2019.
[2] Wang et al., Learning the solution operator of parametric partial differential equations with physics-informed DeepOnets. arXiv, 2021.