Neural network from TENET exploiting time inversion

Modelling dynamical systems with recurrent neural networks

Kirill Tsyganov
Towards Data Science

--

Some time ago I used a time series forecasting approach based on time inversion beautifully incorporated in a special type of recurrent neural network invented by Dr. Hans-Georg Zimmermann — a world-class deep learning practitioner who led a research of neural networks in application to the industrial & financial problems at Siemens for many years [1]. The underlying philosophical ideas he utilized lie beyond the scope of Christopher Nolan’s TENET, but still, the approach remains super legit.

The Idea of TENET

A dynamical system is a group of related things which state changes over time. Open systems can react to an environment, while closed systems aren’t affected by an environment and function completely autonomously. They are modeled the following way:

(1) Open and closed dynamical system models

The models above assume that the dynamical system has information proceeding forward in time (causal) from the past to the present.

BUT there’re dynamical systems where the information flow proceed in the reverse time direction (retro-causal) from the future to the present [3]. These are dynamical systems whose changes over time are influenced by planning involving the prediction of future observables. For example, most market prices are determined not only by supply and demand, but also by planning aspects of the market participants. In such cases we may benefit from using models assuming a mixture of causal and retro-causal influences.

(2) Dynamical system model with causal & retro-causal influences

Model behind the curtain

1. Motivation for closed dynamical systems

Let’s first consider the causal open dynamical system model (1). If for the transition function we take an activation function applied to a linear state transition and add an output equation you may recognize a standard recurrent neural network (RNN), which is very well suited for the dynamical system modelling by construction and widely used to handle sequential data, e.g. time-series and natural language.

(3) Simple recurrent neural network for an open dynamical system

It’s quite a complicated model which requires taking into account both internal autonomous system state and external inputs. So let’s simplify this complicated model by reformulating the problem to a more complex one— consider the closed dynamical system. For that, we have to expand our internal system space by adding an external subsystem containing dynamics of the external variables. This tradeoff between the model complication and the problem complexity is not cheap and will cost us later when we teach the model to understand new variables’ relations of the expanded internal space.

(4) Transforming open dynamical system into closed one

But we obtained a simply looking model (5) whose long-term consistency depends only on strength of interrelations of the internal system variables, and not on external inputs, which can be also difficult to obtain in practice. Note that in the new model we also got rid of the complicated output extraction — our output y is simply the subset of the system state s without any changes.

(5) RNN modelling a closed dynamical system

2. Teaching the Causal / Forward-in-Time model

In the model (5) all fitting parameters sit in the weight matrix A. In order to fit them, we use a standard error back-propagation-through-time (BPTT) algorithm with a special trick correcting errors at each timestamp called teacher-forcing.

Objective/loss function which we want to minimize by tweaking the weight matrix A and the initial state bias is a sum of squared errors at each timestamp.

(6) Objective/loss function — identification of the system

Fitting RNN, i.e. Back-Propagation-Through-Time involves
1) Forward pass where we compute outputs at each timestamp
2) Error calculation at each timestamp and then total error calculation
3) Calculate gradient of the total error with respect to the model parameters by backpropagating the error through time
4) Update the weights according to the gradient
Repeat steps 1–4 until the model error is low enough.

Since there’s only one history path, i.e. the sequence of observations y, we have only one data sample to train the model.

In forward pass of the fitting procedure we use teacher forcing — at each timestamp the target is substituted in the part of internal system state s corresponding to the observable output y. Yet, the rest part of internal system state s corresponding to the non-observable system variables remains unchanged. This way we don’t propagate errors and therefore don’t accumulate errors while training the model.

(7) Teacher forcing for error correction

After the forward pass we can easily calculate the total error (6) using model outputs and targets. Then we have to compute gradient of the error with respect to the model parameters.

(8) Error gradient with respect to weight matrix

In the last step of the algorithm, using the gradient calculated we update model parameters/state transition dependencies, i.e. weight matrix A:

(9) Weight matrix update in BPTT

The recurrent neural network with teacher forcing described above is called Error Correction Neural Network (ECNN) or Historical Consistent Neural Network (HCNN)[2].

Note that the weight matrix A can be quite large due to the autonomous system expansion (4) — the number of scalar weights there equals to the total number of system variables squared. It grows rapidly when we increase the system space and is a bottleneck of the modelling approach. A good practical trick to speed up the training process is to use a sparse weight matrix A — it’s closer to the reality to assume a few connections between system variables than to start from the assumption that everything is interconnected.

You can also check my python implementation of this part:

3. Time inversion: from Causal to Retro-Causal model

The formulation of reverse-engineering principle that I found in Dr. Zimmermann’s presentation clarifies the intrinsic motivation behind the retro-causal model:

When we are able to identify the goals of the agents we can try to explain the dynamics backward from the goals

Indeed, we are not restricted to learn only causal dependencies and here it’s exploited. Let’s pretend that we live in the inverted time world and simply use the same model architecture to learn the system transition from the future to the past.

(10) Retro-Causal RNN for the time inverted

The same way we apply BPTT with teacher forcing to fit inverted time system transition weight matrix and the initial state bias.

For the implementation of this part we basically just need to train the standard causal RNN from part 2 on the inverted sequence.

4. [TENET] Superposition of Causal and Retro-Causal neural networks

Finally we symmetrically combine normal time and inverted time networks into a single network [3].

(11) Causal & Retro-Causal neural networks combined

Training this network again requires BPTT with teacher forcing.

(12) Teacher forcing for Causal & Retro-Causal networks combined

My implementation of this part on PyTorch: https://github.com/uselessskills/hcnn/blob/master/tutorials/crc_hcnn.ipynb

Conclusion

We began from time inversion idea presented by TENET and then mapped it to a model fueled by recurrent neural networks. Or was it really the other way around?

Special thanks to Dr. Alexey Minin, who introduced this remarkable approach to me and patiently advised me many times when I tried to understand it until I felt it.

References

[1] https://www.researchgate.net/profile/Hans-Zimmermann-4

[2] Zimmermann HG., Tietz C., Grothmann R., Historical Consistent Neural Networks: New Perspectives on Market Modeling, Forecasting and Risk Analysis, Springer, 2013

[3] Zimmermann HG., Grothmann R., Tietz C., Forecasting Market Prices with Causal-Retro-Causal Neural Networks, Springer, 2012

[4] Python package with my implementation of the networks described (HCNN, Causal & Retro-Causal NN): https://github.com/uselessskills/hcnn

--

--