The world’s leading publication for data science, AI, and ML professionals.

Teaching AI how to do Quantum Mechanics

Computing quantum mechanical properties of compounds like its atomization energy accurately can take hours to weeks using conventional…

Pictured top left: Schrodinger, Einstein and Pople. Pictured bottom middle: Hinton and Bengio
Pictured top left: Schrodinger, Einstein and Pople. Pictured bottom middle: Hinton and Bengio

Computing quantum mechanical properties of compounds like its atomization energy accurately can take hours to weeks using conventional state-of-the-art methods. This article explores the use of deep neural networks to compute said properties in a matter of seconds, with 99.8% accuracy.

Training a DNN to predict atomization energy based on a Coulomb matrix input. The average error is 3 kcal/mol, equating to a mind-boggling 99.8% accuracy
Training a DNN to predict atomization energy based on a Coulomb matrix input. The average error is 3 kcal/mol, equating to a mind-boggling 99.8% accuracy

After the discovery of quantum mechanics in the 1920s, scientists quickly realized that the fundamental equations that arose from the theory could only be solved for the simplest systems (like the hydrogen atom…). In the words of Nobel Laureate Paul Dirac himself:

"The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble. It therefore becomes desirable that approximate practical methods of applying quantum mechanics should be developed, which can lead to an explanation of the main features of complex atomic systems without too much computation" – Paul Dirac, 1929

Rapid development then ensued to find new approximate methods to deal with this problem of complexity. Two state-of-the-art methods arose from this: Hartree-Fock Theory (HF) and Density Functional Theory (DFT).

In particular, breakthroughs were made in DFT in the 1990s that remain state-of-the-art.

Today, DFT is a cornerstone of computational chemistry and materials Science. It can be used, for example, to accurately model the electron density surface of large molecules like buckminsterfullerene:

Electron density map of Buckminsterfullerene (C60)
Electron density map of Buckminsterfullerene (C60)

The problem with DFT and HF is that to get accurate results, it still takes a long time: hours, days or even weeks.

We are going to use the predictive power of deep neural networks to cut this time down to a couple of seconds.

A Google Colab notebook accompanying this post is available to read and deep dive into some of the processes mentioned here.

Finally, due credit is paid to G. Montavon et al for initially publishing this work. A great find indeed, and they are contuining to do some interesting stuff.


1. What is the Coulomb Matrix, and why is it important?

The Coulomb Matrix is a device that essentially stores all the information of different atoms interacting with each other.

To be precise, it stores the pairwise electrostatic potential energy of all pairs of atoms in a molecule.

Each atom of a molecule has a charge associated with it. Furthermore, each atom is has a set of 3D coordinates representing its position in space. So, for every pair of atoms, we note down the distance of the two atoms. The Coulomb Matrix is then simply:

Definition of the Coulomb Matrix
Definition of the Coulomb Matrix

We want to use the Coulomb Matrix as our data input because it has predictive power.

In fact, given the Coulomb Matrix, we can perform DFT or HF calculations to accurately calculate complex quantum mechanical properties (the Coulomb matrix directly enters the Schrodinger equation – refer back to the front image).

We are essentially using the Coulomb matrix to implicitly teach the machine learning model about the rules of Quantum Mechanics.


2. What are the issues with the Coulomb Matrix?

There are in fact 3 issues. Luckily, these can all be tackled to still yield a very useful model.

2.1 Differently-sized molecules have differently-sized Coulomb Matrices. This is not compatible with machine learning.

The solution to this problem is to pad the edges of the Coulomb matrices of smaller molecules, like so:

Luckily, we are using data in which padded Coulomb matrices have already been generated. We don’t need to worry about this problem for now.

2.2 There exist several (N! to be precise) ways to label your molecule – each scheme of labelling will result in a different Coulomb Matrix.

Different atom labeling leads to different Coulomb matrices of the same molecule
Different atom labeling leads to different Coulomb matrices of the same molecule

The solution to this is to simply sort the matrices in order of largest atoms.

There is a catch though – when sorting, we actually add a bit of noise to the matrix, so that each sort is slightly different. This exposes the neural network to several slight variations of the same Coulomb matrix, making it more robust and preventing overfitting.

2.3 A lot of label-relevant information is contained in the ordering

This problem can be circumvented by transforming the Coulomb matrix into a binary input (0s and 1s) by pushing it through an activation function. The details of this are not that important, and to find out more about the details, as well as many other details that have been skimmed over in this post, read the following: Learning Invariant Representations of Molecules for Atomization Energy Prediction

Converting the Coulomb Matrix into a Binary input
Converting the Coulomb Matrix into a Binary input

For the above two issues, these Input and Output classes will sort it out. You can work through the code, or you could not do that and just trust it 😉

Stochastic augmentation and input processing functions
Stochastic augmentation and input processing functions

3. What is our data?

We use data of about 7,000 molecules that feature several different functional groups.

For each molecule, the Coulomb Matrix has been calculated (this is easily doable in code; the coordinates of atoms for each molecule were optimized using the MM94 force field – all fairly standard).

For each molecule, we also compute its atomisation energy, a quantum mechanical property, using a DFT implementation. This is the data we are trying to predict, and we will use the computed values to train our neural network.

The flow of data from molecule to quantum energy prediction
The flow of data from molecule to quantum energy prediction

4. What architecture is our deep neural network?

We use a very simple architecture: 2 fully connected hidden layers, with 400 and 100 neurons in each layer respectively.

Weights were initialised with Xavier Initialization.

There is a lot of scope for more complicated architectures (feel free to try them out and let me know if you find better ones) but this simple structure works for this problem.

We used something somewhere in between these two...
We used something somewhere in between these two…

We use the Adam optimizer to minimize our MSE (mean-squared-error) loss function.

The neural network is implemented in Google’s Tensorflow library, and the code is in the Colab notebook.


5. Results

If you want to delve deep into more of the details, check out the Google Colab notebook here.

You should be able to run it all with no problems, and the neural network should train very quickly (shoutout Google Colab for the free GPU).

The results of the regression are shown below. Pretty good, as you can see…

The straight line represents a perfect prediction while the scattered points represent predictions made by the DNN
The straight line represents a perfect prediction while the scattered points represent predictions made by the DNN

6. Next Steps

To deep dive on some more of the concepts tackled in this blog post, go ahead and read the original paper in which this research appeared.

This research group released some more papers expanding on this idea, so definitely check that out too.

If you enjoyed this, give the post a clap, connect with me on LinkedIn and tell me what you liked. See you next time!


Related Articles