Introduction
Dynamic systems in simple terms, are systems that evolve over time. What evolves over time? The states of the system. What are states? States are variables that describe the system. For instance, in the case of motion of celestial bodies, we could consider the current position and velocity as the states. These states can be used to predict the future position of the celestial body. In this example, the motion of the celestial body is the dynamic system.
More formally, any system in which the future states are dependent on the current state along with a set of evolution rules is called a dynamic system. Going back to the celestial body example, we know that its motion is bound by the laws of gravitation.
In order to better understand these systems and make reasonable predictions, we rely on mathematical equations to represent them. In the case of dynamic systems, we use ordinary differential equations (ODEs). Simply put, ODEs are equations relating functions and their derivatives. Particularly, in dynamic systems we deal with derivatives with respect to time. A general form of a single variable dynamic system can be represented as an ODE shown below.

There are several numerical methods available to solve ODEs like Euler’s method, implicit Euler’s method, Runge-Kutta methods, and more. So why do we need another method? Why do we need deep learning? While it is true that several numerical approximations exist for solving ODEs, in practice however, it is quite complicated to obtain these equations for highly nonlinear systems. This is because it is difficult to account for all the factors that affect the system. For example, in the case of weather predictions, it is simply impossible to even know all the variables affecting them. We therefore rely on data to model these systems.
In a first-principles model, a set of mathematical equations obtained from the physical laws of conservation like mass balance, energy balance, and momentum balance are used to represent the system. Whereas, in data-based models (black-box models) we try to learn the behaviour of the process from the data collected. As shown below, we do not know the true underlying phenomena. We therefore use the input and output data to identify a model that best represents the system.
![Figure 1: Data-based model representation [Source]](https://towardsdatascience.com/wp-content/uploads/2021/02/1kCslcXGUwz7naOIU0nV0Mw.png)
In this article, we are going to be focusing on long short-term memory (LSTM) networks to model nonlinear dynamic systems. A basic understanding of recurrent neural networks (RNN) and LSTM would be helpful. To better understand the workings of RNN and LSTM, refer to these beautifully illustrative articles shown below.
Illustrated Guide to Recurrent Neural Networks
Illustrated Guide to LSTM’s and GRU’s: A step by step explanation
Dynamic system under consideration
Let us consider a black-box nonlinear process. Let us define (x₁(k) , x₂(k)) as the states at time k and (u₁(k) , u₂(k)) as inputs at time k. (x₁(k+1) , x₂(k+1)) are the states at the next time step.

Say the nonlinear process is at the current state of (x₁(k) , x₂(k)) and we excite the process with the input signal (u₁(k) , u₂(k)). We then measure the output and store it as (x₁(k+1) , x₂(k+1)) as shown above. We repeat this experiment several times and record our data. Our goal is to then use this data to model the underlying process. This methodology is the same for any process where the client requires a data-based model.
In order to do justice to my degree in Chemical Engineering, I will be considering a nonlinear continuous stirred tank reactor (CSTR) as our dynamic system. For those of you who understand a CSTR, the two states x₁ and x₂ are concentration of the reactant (Cₐ) and temperature of the reactor (T) respectively, while the two inputs u₁ and u₂ are the inlet concentration (Cₐ₀) and heat provided to the reactor(Q) respectively. Do not panic if you have never heard of this, it is just to give some context to the system.
Data Generation
Unfortunately, I do not have a chemical reactor lying in my backyard to perform this experiment (would be weird if I did). Fortunately, such a system can be accurately represented by a set of ODEs shown below.

These equations are enough to completely describe the underlying process. In practice however, data would be contaminated with measurement noises and unmeasured disturbances. Therefore, to generate data for this project, we could use the set of ODEs shown above and add random noise to obtain a very practical data. The first few samples are shown below.

In practice, usually data is collected in batches. To replicate this, I have collected 15 batches of data. In each batch, I start at a different initial state and bombard it with random input signals. An example plot of batch number 15 is shown below.

It is important to scale this data before training the model due to the order of difference in the magnitudes of the variables under consideration.
LSTM Network

Objective: Build an LSTM network in PyTorch to model the nonlinear dynamic system discussed above.
We are going to be using two hidden layers with 15 and 10 LSTM cells respectively. The inputs to this network are my current states and input signals, while the output from the network is the future state of the system
I’ll explain the procedure and provide a code alongside for you to use it for your system.
I have stored the entire data in _dataframe. Let us start by batching the data by defining the batch size. I have collected 101 data points in each batch. I have used 14 batches of data to train the LSTM network and the remaining batch to test the model.
I am using batch number 5 as the test set. We are making sure the data points belong to float type and I am splitting the train set and the test set.
As discussed earlier, scaling is very crucial for our data. We are creating two MinMax scaler objects to scale the data between -1 and 1 and save it offline as pickle objects.
Note: The testing set was not included to fit the scaling parameters to avoid any form of data leakage.
The normalized train and test set are then converted to a torch tensor object. We have to present the data in a way the inbuilt PyTorch model can understand. For this purpose, we define a function _inputdata, that takes in the data and a window size to return a time series sequence of input and output data. Window size refers to the size of future predictions. In our case, we only need the next immediate output. Therefore the window size is one.

The function _inputdata takes in the data frame shown above along with the window size and returns the list shown below with tensor objects. In each element of a list, there is a tuple of size (4,2) with inputs and outputs to the LSTM network respectively. This forms our training data.

We are now ready to define the LSTM model with two hidden layers. The class LSTMmodel written above works for any input size, output size and neurons in the hidden layers. The forward function within this class takes in the sequence of data and returns the predictions.
We now focus on the problem at hand and define an LSTMmodel with an input size of 4, two hidden layers with 15 and10 neurons respectively, and output size of 2. We also define a mean squared error loss and Adam optimizer.
All the hyperparameters used here were obtained using cross-validation.
Here, we are training the model for 100 epochs and printing the training loss for every 10 epochs. It is important to make sure to train the data in batches.
We are now ready to make predictions on the test set, batch number 5. We set the model to evaluation mode and let PyTorch know that we are no longer interested in the gradients. We make predictions on the test set and store it in preds.

Voila! The figure above shows the comparison of the predicted states with the true value of the states. We can see the performance of the model on the test data to be pretty good with an RMSE of 3.27 (1%).
Bonus
For a few applications, it would be necessary to make predictions for more than just one-time step. That is, we might have a longer prediction horizon. In these cases, we could tweak the code to use the predictions to make further predictions as shown below.
The performance is expected to be slightly poor. This is because the predictions beyond the first time step is heavily dependent on the first prediction. If the first prediction is off even by little, the errors are going to accumulate and diverge, resulting in poor performance. One way to improve the model is to train it with multiple future predictions in mind.

Conclusion
We come across nonlinear dynamic systems all the time. In order to make forecasts or future predictions, it is imperative to have a good model to represent these dynamic systems. It is sometimes difficult to model highly nonlinear systems as explicit mathematical equations. Therefore, we rely on data-based models or black-box models. In this article, I have presented one such way to model nonlinear dynamic systems using LSTM. The codes I have presented here are very flexible. Feel free to use them for your problem. You can find the codes at my GitHub repo.
References
- Heidarinejad, M., J. Liu, and P. D. Christofides. 2012. "Economic model predictive control of nonlinear process systems using Lyapunov techniques."AIChE Journal58 (3): 855–870.doi:10.1002/aic.12672. https://aiche.onlinelibrary.wiley.com/doi/abs/10.1002/aic.12672
- Wu, Zhe & Tran, Anh & Rincon, David & Christofides, Panagiotis. (2019). Machine Learning‐Based Predictive Control of Nonlinear Processes. Part I: Theory. AIChE Journal. 65. 10.1002/aic.16729
- https://www.udemy.com/course/pytorch-for-deep-learning-with-python-bootcamp/





