Physics Informed Neural Networks (PINNs): An Intuitive Guide

The (readable) what, how, and why of physics informed neural networks

Ian Henderson
Towards Data Science

--

Image by Author

If you’ve ever tried to read existing literature on physics informed neural networks (PINNs), it’s a tough read! Either lots of equations that for most people will be unfamiliar and assumptions that you are already an expert with all of the concepts, or too simplistic to gain a good understanding. This post aims to walk through PINNs in an intuitive way, and also suggests some improvements over current literature.

Traditional physics model creation is a task of a domain expert, who parametrises physics models to best fit a system of interest. For example, creating a model of aircraft dynamics using equations of drag, lift, gravity, thrust, etc., and parametrising the model to attempt to closely match the model to a specific aircraft.

The purely data-driven neural network approach is to attempt to learn the model using supervised learning with a neural network from data obtained from a specific system.

Physics Informed Neural Networks (PINNs) lie at the intersection of the two. Using data-driven supervised neural networks to learn the model, but also using physics equations that are given to the model to encourage consistency with the known physics of the system. They have the advantage of being both data-driven to learn a model, but also able to ensure consistency with the physics, as well as being able to extrapolate accurately beyond the available data. As such, PINNs are able to generate more robust models, with less data.

PINNs lie at the intersection between neural networks and physics. Image by Author

An understanding of neural networks, kinematics, and ordinary and partial differential equations will be very useful to fully digest the content on this page, but not essential to be able to gain an intuitive understanding.

Most examples of PINNs in the literature are based on physics equations such as fluid motion (Navier–Stokes), light, and wave propagation (nonlinear Schrödinger equation, Korteweg–De Vries), or other such functions, and consider functions with respect to time [1].

To gain some intuition, we shall explore PINNs using laws of motion. More concretely, we shall use projectile motion as an example as it provides a simple example to explore, but complex enough to cover the various aspects of PINNs.

By the end of this article, we shall understand how a PINN works, and what are the trade-offs and differences between PINNs, pure data-driven neural networks, and pure physics functions.

Projectile Motion (time-based)

Let us consider the function of projectile motion (a projectile in free-fall) with the effects of gravity and drag. We know something about this problem already, in terms of the physics equations that describe projectile motion, and the relationships between them.

The displacement vector (position) of the projectile at time t is defined by the following function:

The first derivative of displacement gives the velocity vector, defined as:

And the second derivative gives acceleration, defined as:

The acceleration of a projectile is given by:

Where μ is the coefficient of drag (unknown variable), and g is the gravity vector (known variable). Essentially, the acceleration of a projectile at a given point in time is relative to its current speed in the opposing direction of travel (drag), and being pulled down by gravity.

Intuitively, derivatives tell us the rate of change of a function. As an example, when the derivative of velocity (acceleration) is 0, it tells us that the velocity is not changing. When acceleration is positive, velocity is increasing. When acceleration is negative, velocity is decreasing.

We can notice that the acceleration depends on the velocity. As such, there is no closed form solution for the displacement, in that there exists no finite equation which gives us the displacement at a given time t. We are required to resort to numerical integration methods such as Runge-Kutta.

A single ground-truth displacement of a projectile in motion (trajectory) with an initial position and velocity, under the influence of gravity and drag. Image by Author

In An Ideal World…

The rudimentary approach to training a neural network involves vanilla supervised learning, using perfect data across the entire domain of interest.

Here, the network’s objective is to learn

Note: all the GIFs on this page show the output of the function being learnt through training.

Training a neural network on perfect data from the entire domain. The neural network is able to quickly learn an accurate model of the trajectory. Image by Author

So problem solved, right? If you’ve ever applied machine learning to a real world problem, you will surely be laughing right now :)

Realistically…

However, having access to perfect data across the entire domain is rarely achievable. More realistically, we have access to data that is noisy, sparse, and incomplete. Taking the rudimentary approach to supervised learning in this case gives us undesirable models.

Training a neural network on noisy, sparse, and incomplete data. The neural network does exactly what’s asked of it, and fits a function against the data we have provided. However, the function is not all that useful towards its intended use of predicting displacement of the projectile. We can say it has “over-fit”. Image by Author

Regularisation

A typical approach to combat this issue is to use regularisation.

L2 regularisation is given by:

It has the effect of penalising large weights within the network, and intuitively, penalising localised gradient specialisation. By applying L2 regularisation we are encouraging the network to fit through the data, and not to the data.

For the parts of the domain where we have data, L2 regularisation improves the usefulness of our model. However, in the absence of data spanning the entire domain, the model is not able to extrapolate much beyond the available data.

Training a neural network on noisy, sparse, and incomplete data using L2 regularisation. The model is more useful in areas where data exists, but is not able to extrapolate across the domain. Image by Author

Enter The Physics Informed Neural Network

Much like training with L2 regularisation, a PINN minimises the data loss, but also uses known physics as an additional regularisation term.

Essentially, we can say “fit the data, but make sure the solution is consistent with the physics equations that we know about”.

The network’s objective is to learn f(t) using supervised learning on the available data, so we have the standard data loss of Mean Squared Error (MSE) between the regression targets and the predicted values.

However, we also know that the acceleration of a projectile is given by

We can incorporate this knowledge into the training of our neural network.

Since the ordinary differential equations (ODEs)

and

give us the values we need for this equation, we can use the auto-differentiation features of our machine learning (ML) library to take the first and second order derivatives (the gradients) of the network with respect to the network’s input (time t) to obtain both the velocity and acceleration vectors. We may then substitute these derivatives into the equation to get the following:

So we are saying that for our network to respect the known physics, the second derivative of our network with respect to time (acceleration) (left hand side), should be equal to the function of the first derivative (velocity) (right hand side). In other words, there should be 0 difference between them:

How can we apply this knowledge to the training process? We can simply minimise the error (MSE) of this equation as an additional loss.

This gives us a physics loss term, which attempts to minimise the error between the gradients of the network (velocity and acceleration) and the corresponding equations given by known physics. Minimising both the data loss and the (weighted) physics loss gives us a PINN. That’s all there is to it ;). The data loss is trained with samples from the training data, and the physics loss is trained with samples from across the entire domain of interest (which we specify). As the physics loss requires only inputs across the domain and no targets, we are free to choose any appropriate sampling strategy (e.g. uniform sampling being a simple strategy).

One thing remains however, in that μ (drag coefficient) is a variable which we do not know. But we can simply make μ a trainable variable along with the network parameters, which means that μ will be discovered during the training process, i.e. we can learn what the drag coefficient is of our projectile from the data.

For all neural networks throughout these examples, the networks use an architecture of 2 fully connected layers of 128 neurons, using Gaussian error linear unit (GELU) activations. Although most examples in the literature use tanh activations, we find GELU provides much more grounded theoretical and empirical benefits [2] [3] [4] (try running any open source PINN examples and swap tanh to GELU). As we are minimising a loss on the gradients of the network, any activation function we use needs to be differentiable everywhere for appropriate continuous gradients, which precludes activations such as standard ReLU (ReLU is piecewise linear, and therefore gives constant gradients).

Training a neural network on noisy, sparse, and incomplete data using physics regularisation (PINN). The physics loss allows the network to both regularise through data-points, as well as extrapolate outside of the training data in a manner consistent with the known physics. It’s not possible to discover the ground-truth solution here due to the low volume of noisy data. With more data and/or less noise, the network is able to learn the ground-truth solution with extremely high accuracy. Image by Author
The data loss is used to ensure a fit through the training data (top), and the physics loss is used to ensure consistency with known physics across the domain (bottom). Image by Author

And if you prefer code to equations, here’s a snippet of code using TensorFlow to implement the above training:

...@tf.function
def train(t_train, s_train, t_phys):
# Data loss

# predict displacement
s_train_hat = net.predict(t_train)
# MSE loss between training data and predictions
data_loss = tf.math.reduce_mean(
tf.math.square(s_train - s_train_hat)
)
# Physics loss

# predict displacement
s_phys_hat = net.predict(t_phys)
# split into individual x and y components
s_x = s_phys_hat[:, 0]
s_y = s_phys_hat[:, 1]
# take the gradients to get predicted velocity and acceleration
v_x = tf.gradients(s_x, t_phys)[0]
v_y = tf.gradients(s_y, t_phys)[0]
a_x = tf.gradients(v_x, t_phys)[0]
a_y = tf.gradients(v_y, t_phys)[0]
# combine individual x and y components into velocity and
# acceleration vectors
v = tf.concat([v_x, v_y], axis=1)
a = tf.concat([a_x, a_y], axis=1)
# as acceleration is the known equation, this is what we want to
# perform gradient descent on.
# therefore, prevent any gradients flowing through the higher
# order (velocity) terms
v = tf.stop_gradient(v)
# define speed (velocity norm, the ||v|| in the equation) and
# gravity vector for physics equation
speed = tf.norm(v, axis=1, keepdims=True)
g = [[0.0, 9.81]]
# MSE between known physics equation and network gradients
phys_loss = tf.math.reduce_mean(
tf.math.square(-mu * speed * v - g - a)
)
# Total loss

loss = data_weight * data_loss + phys_weight * phys_loss

# Gradient step

# minimise the combined loss with respect to both the neural
# network parameters and the unknown physics variable, mu
gradients = tf.gradients(loss, net.train_vars + [mu])
optimiser.apply_gradients(zip(gradients, net.train_vars + [mu]))
...

This simple example demonstrates the effectiveness of PINNs. With a small amount of noisy data, we have been able to learn a robust and accurate model of the projectile. The PINN has also learned the physics parameters (drag coefficient) of the projectile in the process. What’s more, as there exists no closed form solution for the projectile’s displacement, by using a PINN we have obtained an accurate approximate analytical and closed form solution to the function.

Projectile Motion (state-based)

Again, most examples in the literature are focused on functions with respect to time (plus other variables). But what about models that are with respect to the current state of the system and not explicitly time? In other words, a time-stepping model.

Let’s re-phrase the projectile motion problem as a function which takes the current state of the system, in this case the velocity, and gives us the acceleration:

We can define 2 functions; one which takes in the velocity x and y components and gives us the acceleration x component, and one which takes in the velocity x and y components and gives us the acceleration y component:

To gain some intuition, let’s consider the following trajectory following these functions. We start with an initial displacement and velocity and query the functions to obtain the accelerations applied at that point in time. We then add the acceleration to the velocity using a numerical integration method such as Runge-Kutta, and similar again to get displacement. We then repeat this process to obtain a trajectory over time.

The displacement plot (top) below visualises the trajectory over time by following this method. The acceleration vector field plot (bottom left) shows the accelerations for any given velocity in the domain, and the acceleration x and y component plots (bottom middle and right) show the individual acceleration components of the vector field (the functions f and h).

Visualisation of running the time-stepping model consisting of f and h to produce a trajectory. Image by Author

Physics & Gradient Descent

We may ask ourselves, if we are able to learn the values for the parameters of the physics functions, why would we even need to involve neural networks at all? After all, we are using the known physics anyway to regularise the network. Why not just use gradient descent directly on the physics functions?

If we know the functions that govern our data are given by:

And we can learn the unknown parameter μ, then we can use gradient descent directly on the physics functions.

Training the physics functions directly using gradient descent. The ground-truth function (grey wireframe) with available training data generated by the function (grey dots), and the physics function being learnt directly through gradient descent (magenta wireframe). Image by Author

In this instance, we are able to perfectly learn an accurate parametrisation of the physics functions through gradient descent. So why don’t we just do this?

There are in fact several reasons. As we have already seen above, there is not always a closed form solution to use, which prevents this method in many cases.

Even if we do have a closed form solution available, as the physics functions are limited by their parametrisation, the error gradient is likely to be non-monotonic. This means that there is a high probability that the learning will converge to a local optima rather than a global one.

Consider the following function f for which we have data for, and we attempt to learn a parametrisation of the function g through gradient descent. Moving the trainable parameter a away from 0 in any direction results in a positive error gradient (the error gets worse). Therefore 0 is a local optima, and gradient descent will converge here. However, if we move a all the way to the right, we discover the global optima, which gradient descent would not have discovered from its original initialisation value.

Example of a function which will fail to discover a global optimum parametrisation using gradient descent directly on the function. (Image by Author, created using Desmos Graphing Calculator https://www.desmos.com/calculator)

There are of course other optimisation methods than gradient descent…

However, there is still a further issue which means even if this method is successful, it may still not give us a useful model.

In all of the previous examples, the physics functions we have used perfectly represent the function the data was produced from. However, even physics functions are just models, simplifications of the true underlying dynamics.

The projectile motion equations we have used only consider constant drag and gravity. What about lift, orientation and altitude dependent drag, thrust, the Coriolis force, etc.? Unless we are able to specify exactly the underlying physics functions, it is not possible to learn a parametrisation which accurately models the available data.

As an example, let’s add some simple lift to the equations we have used to generate training data:

where Cl is the coefficient of lift. Now let’s try and learn a parametrisation of the physics functions we have used previously (without lift).

Training the physics functions (without lift) on data produced from a different function (with lift). The learnt parametrisation of the known physics functions isn’t able to give us a useful model of the underlying data. Image by Author

Ah… not ideal. There exists no possible parametrisation which accurately models our data.

Physics Informed Neural Network

We can apply the same techniques as previously to train a PINN on the time-stepping version of this task. Whereas before we applied ordinary differential equations (ODEs) as we had a single independent variable (time), we now have multiple independent variables (velocity x and y components).

We can implement the multiple functions as a single multi-output neural network.

We can also take the gradients of this function with respect to the independent variables using partial differential equations (PDEs):

The gradient of the acceleration x component with respect to the velocity x component.

The gradient of the acceleration y component with respect to the velocity y component.

The gradient of the acceleration x component with respect to the velocity y component.

The gradient of the acceleration y component with respect to the velocity x component.

Which are visualised below:

Fun fact: the first derivative of acceleration is called jerk, and the second, third, and fourth derivatives are snap, crackle, and pop :)

Partial derivatives as per the above equations. Image by Author

By doing so, we obtain the partial derivatives for the known physics. Similar to previously, we can formulate a loss function which minimises the error between the gradient of our network outputs (dependent variables) with respect to the inputs (independent variables) and these known physics equations. Minimising the data loss and the (weighted) physics loss gives us a PINN. Intuitively, the weighting term says how much emphasis to we want to place on conforming to the physics vs. the available data.

By training this PINN on the task, we are able to fit the network to the data, but also regularise and extrapolate based on the given known physics, as well as learning the parametrisation of the physics functions (the drag coefficient in this case). We are able to produce an accurate model, but the PINN does not suffer the issues of requiring a closed-form solution, and without converging at local optima due to the use of the neural network to learn the function. Although the available data consists of a single trajectory, we have been able to obtain an accurate model of the projectile over the entire domain (i.e. from any state), not just close to the training data trajectory.

Training a PINN on the task. The ground-truth function (grey wireframe), training data produced by the ground-truth function (grey dots), physics function parametrised by the variables learnt by the PINN (magenta wireframe), and PINN solution (green surface). Image by Author

As we saw previously when training the physics function directly using gradient descent, unless the given known physics were an exact match to the function that produced the data, it was not possible to produce a useful model. However, as a PINN generates a model based on both the available data and the known physics, we are able to produce a model which combines the best of both.

Let’s re-visit the example of generating training data from a function which includes lift:

But provide the PINN with known physics which do not include lift:

The PINN is able to learn a function which fits the training data (from the ground-truth function with lift), but also ensures as much consistency as possible with the known physics. Where we have data, the data loss ensures a fit against the data and the physics loss acts as a regularisation term. Where we don’t have data, the physics loss allows extrapolation based on the gradient of the known physics function.

Training a PINN on data (grey dots) produced from a different function (with lift, grey wireframe) to that we have informed the PINN (green surface) about (magenta wireframe). Image by Author

We can see that the PINN (green) produces a more useful model of the ground-truth (grey) than the best fit for the physics function directly (magenta). The PINN is able to fit against the data points which are produced with lift, yet still able to blend this with regularisation and extrapolation of the gradient of the best fit learnt parametrised known physics function without lift. We could also take the approach of learning the parameters of the physics functions through training the PINN, and use the learnt parametrised physics model directly, depending on our needs. This has the trade-off of higher interpretability/transparency as inference is through known physics vs. a potentially lower fidelity model as the physics may not fit the data accurately.

Conclusion

By reading this article, we have gained an understanding on how and why to use physics informed neural networks, and the differences in using different methods. PINNs provide a means of learning robust and accurate models of systems where we are able to provide existing domain knowledge in the form of known equations that govern the data, even in situations where the equations don’t exactly match the data. The additional physics information allows for the discovery of variables in known equations, and allows physics-consistent solutions to be learned with far less data than pure data-driven learning alone.

[1] (PINNs) https://maziarraissi.github.io/PINNs/

[2] (GELU Paper) https://arxiv.org/abs/1606.08415

[3] (GELU TDS) https://towardsdatascience.com/on-the-disparity-between-swish-and-gelu-1ddde902d64b

[4] (Activation Functions) https://mlfromscratch.com/activation-functions-explained/

--

--

Ian is a machine learning specialist and PhD student with a particular interest in reinforcement learning, simulation, and modelling.