A REAL-WORLD APPLICATION OF SYMBOLIC REGRESSION
The new method derives accurate functionals (elements of quantum mechanical calculations) in symbolic form, hence human-interpretable, cheap to compute, and easy to integrate into existing software for quantum mechanical calculations.
It’s no news that the giant Alphabet invests quite a lot in ML applications to science, through channels such as Google Research and Deepmind. While in the fields of Chemistry and biology AlphaFold is by far its most famous project, Deepmind has also gone into quantum mechanical (QM) calculations (my blog entry here), and so is doing Google Research.
QM calculations are very important in chemistry, as they provide the highest level of detail about electron densities, distributions, and spin states in molecules and materials, all the key elements required to model, understand, and predict their chemical reactivity and physicochemical properties -none of which are approachable with classical methods. The new work I comment on here comes from Google Research and also addresses ways to improve QM calculations. Specifically, Ma et al developed a new method to derive symbolic, analytical forms of DFT functionals.
What is all this?
At the heart, QM calculations attempt to describe the electronic properties of molecules and materials from first principles, right from the distributions, states and energies of their electrons. To do this, QM essentially needs to solve the many-body Schrodinger equation of interacting electrons.
Today, the most widely used method to carry out QM calculations is Density Functional Theory (DFT). DFT requires modules to somehow treat how multiple electrons interact, which is achieved through specific exchange-correlation (XC) terms. These terms should in theory be exact, but in practice are not. Hence, in most applications they are approximated in numerical ways such as analytical equations fitted to parameters, neural networks trained from data, etc., utilizing datasets of molecular or materials properties.
Most available XC functionals consist in equations containing small numbers of fitted parameters, which run fast at the expense of quality in their results, or complex expressions with very large numbers of parameters or even black-box neural networks, that produce more accurate results but at the expense of calculation speed and interpretability. On top, neural networks and other XC terms not based on analytical forms are complicated to integrate into software packages for DFT calculations.
The work by Ma et al takes the best of both worlds: It uses an ML method based in symbolic Regression to build analytical equations that represent XC functionals based on elementary mathematical instructions and other smaller, already existing functionals. The algorithm begins with a small pool of simple parameters, terms, and eventually small existing functionals, creates a population of candidate solutions and evolves them over generations to come up with equations that put together all these ingredients into symbolic expressions that reproduce the dataset used for training.
The symbolic representation of the XC functionals produced by this new method looks just like that of regular XC functionals used in QM software, so their integration into software packages is straightforward.

Applications
Ma et al named the procedure "Symbolic Functional Evolutionary Search", as this is what it does: it evolves a symbolic expression that describes the functional, and searches in the space of equation forms and parameters to optimize how well the training data are reproduced.
As a first application, the authors demonstrate that their method can re-discover simple existing functionals from scratch, and can furthermore obtain novel, more accurate functionals evolved from simpler ones but retaining the simplicity allowed by symbolic regression.
Then, they applied their method to develop a novel functional that they dubbed "Google Accelerated Science 22" (GAS22) which performs better than the best-established alternative, exhibits good numerical stability, and is seamlessly integrable into existing QM software. On top, given its simplicity, GAS22 is amenable to all the interpretation methods normally applied on functionals to understand their working and limitations.
Better-performing functionals are essential to improve the quality of QM calculations; while faster execution allows access to bigger systems, i.e. with larger number of atoms as usually required to treat biological systems and pieces of materials. The new method and possibly other developments underway in this branch of Google Research (and of course in the academic community as well as in other smaller but well-established companies) are then important for a future where scientists spend less time and money doing experiments because they can first predict their results with higher accuracy and quicker.
Related reads
The preprint in arXiv:
UPDATE: The preprint was accepted after peer review in Science Advances:
https://www.science.org/doi/10.1126/sciadv.abq0279
Related work by Deepmind:
DeepMind Strikes Back, Now Tackling Quantum Mechanical Calculations
Other example applications of symbolic regression to science:
www.lucianoabriata.com I write and photoshoot about everything that lies in my broad sphere of interests: nature, science, technology, programming, etc. Become a Medium member to access all its stories (affiliate links of the platform for which I get small revenues without cost to you) and subscribe to get my new stories by email. To consult about small jobs check my services page here. You can contact me here.