Towards Automatic Discovery of Mathematical and Physical Laws: Algebraically-Informed Deep Networks (AIDN)
One of the central problems in the interface of Deep Learning and mathematics is that of building learning systems that can automatically uncover underlying mathematical laws from observed data. Over the past few years, deep learning techniques have been used to solve many types of equations including partial differential equations [2], non-linear implicit system of equations [3] and transcendental equations[5].
In this article we make one step towards building a bridge between solving algebraic equations and deep learning, and introduce AIDN, Algebraically-Informed Deep Networks [1]. Given a collection of unknown functions that satisfy a collection of constrains or equations, AIDN is general algorithm aiming towards solving this problem using deep learning.
To explain the main idea of the AIDN algorithm and its broad set of applications we give start with an example,
The Yang-Baxter Equation
The Yang-Baxter equation is an equation of the form :
where the function
is an unknown function that we like to find. For this article we can think of A as a subset of a Euclidian domain. Any invertible function R that satisfies the Yang-Baxter equation is called an R matrix. Finding solutions of the above equation has a very long history with a broad set of applications including quantum field theory, low dimensional topology and statistical mechanics and it is considered in general a difficult problem.
Deep learning to the rescue
We explain now the application of the AIDN algorithm on finding the solution for the Yang-Baxter equation above. To solve the Yang-Baxter equation, the AIDN algorithm realizes the problem of finding the solution R as an optimization problem. First we write the function R as a deep neural network. We will call this network _fR or simply f.
Since we require f to be invertible then we create another network g that we train to be the inverse of f. In theory we can choose the network f to be invertible but in our paper we chose to train g such that it is the inverse of f. To obtain better intuition for our solution, it is easier to adapt a graphical notation. First we represent the networks f an g by the following two black boxes :
Given this notation we can represent the fact that two functions f and g are inverse to each other via the following diagrams :
Here vertical concatenation means composition of the networks. The above diagrams simply means that f followed by g and g followed by f is the identity map.
Similarly, the Yang-Baxter equation can be represented via the diagrams:
To train these two networks, we define the following objective :
The above loss function basically has two purposes : satisfying the Yang-Baxter equation as well as forcing the invertibility of the function f.
In a more practical terms, to train f, g we create an auxiliary neural network to aid us in the training process. Specifically, the auxiliary network is trained with the loss function :
where
and
where {xi,yi,zi} denote points that are sampled uniformly from A × A × A where A⊂ R^n, typically the unit interval [0, 1]. That is all! after training the functions f and g, provided a solution exists, satisfy the Yang-Baxter equation (see our repo for more details about the implementation).
The general version of the AIDN algorithm
The idea that we explained for the Yang-Baxter equation can be generalized and applied to a larger set of problems. Specifically, given k equations (or we as we call them in our paper relations ) with n variables (we call them in the paper generators), then AIDN proceeds by creating a set of n neural networks and train these networks to satisfy the relations by casting the equations as an MSE loss function. The details are given in the following :
Where to use AIDN ?
The algorithm we proposed is rather general and it can be applied in many areas. We only scratched the surface and explained its applicability to the Yang-Baxter equation. Today, the Yang-Baxter equation is considered a cornerstone in several areas of Physics and mathematics with applications to quantum mechanics , algebraic geometry , and quantum groups. In theory, given enough computational power, AIDN is capable to finding the solution of any algebraic set of equations provided a solution of such system exists. In our paper [1], we demonstrate many other applications including finding new quantum invariants.
Limitations and conclusion
An important remark about the performance of AIDN is the difficulty of training while having many generators and relations. While modern optimization paradigms such as SGD allows one to train a model for high-dimensional data, we found that training multiple networks associated with an algebraic structure with many generators and relations and high-dimensional data to be difficult. This can be potentially addressed using better hyperparameter search and a more suitable optimization scheme.
Although there is no theoretical guarantee that the neural networks exist, we consistently found during our experiments that AIDN is capable of achieving good results and finding the desired functions given enough expressive power for the networks and enough sample points from the domains of these networks. This observation is consistent with other open research questions in theoretical deep learning [4] concerning the loss landscape of a deep net.
Code
The repository for the above examples and many other more are given in details in our repo.
Refs
[1] Hajij, Mustafa, et al. "Algebraically-Informed Deep Networks (AIDN): A Deep Learning Approach to Represent Algebraic Structures." arXiv preprint arXiv:2012.01141 (2020).
[2] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561, 2017
[3] Yang Song, Chenlin Meng, Renjie Liao, and Stefano Ermon. Nonlinear equation solving: A faster alternative to feedforward computation. arXiv preprint arXiv:2002.03629, 2020
[4] Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017
[5] SK Jeswal and Snehashish Chakraverty. Solving transcendental equation using artificial neural network. Applied Soft Computing, 73:562–571, 2018