What’s new in Graph ML?

Denoising Diffusion Generative Models in Graph ML

Is Denoising Diffusion all you need?

Michael Galkin
Towards Data Science
8 min readNov 26, 2022

--

The breakthrough in Denoising Diffusion Probabilistic Models (DDPM) happened about 2 years ago. Since then, we observe dramatic improvements in generation tasks: GLIDE, DALL-E 2, Imagen, Stable Diffusion for images, Diffusion-LM in language modeling, diffusion for video sequences, and even diffusion for reinforcement learning.

Diffusion might be the biggest trend in GraphML in 2022 — particularly when applied to drug discovery, molecules and conformer generation, and quantum chemistry in general. Often, they are paired with the latest advancements in equivariant GNNs.

Molecule generation. Generated with Stable Diffusion 2

The Basics: Diffusion and Diffusion on Graphs

Let’s recapitulate the basics of diffusion models using the example of the Equivariant Diffusion paper by Hoogeboom et al using as few equations as possible 😅

Forward and backward diffusion processes. Forward process q(z|x,h) gradually adds noise to the graph up to the stage when it becomes a Gaussian noise. Backward process p(x,h|z) starts from the Gaussian noise and gradually denoises the graph up to the stage when it becomes a valid graph. Source: Hoogeboom, Satorras, Vignac, and Welling.
  • Input: a graph (N,E) with N nodes and E edges
  • Node features often have two parts: z=[x,h] where x ∈ R³ are 3D coordinates and h ∈ R^d are categorical features like atom types
  • (Optional) Edge features are bond types
  • Output: a graph (N,E) with nodes, edges, and corresponding features
  • Forward diffusion process q(z_t | x,h): at each time step t, inject noise to the features such that at the final step T they become a white noise
  • Reverse diffusion process p(z_{t-1} | z_t): at each time step t-1, ask the model to predict the noise and “subtract” it from the input such that at the final step t=0 we have a new valid generated graph
  • A denoising neural network learns to predict the injected noise
  • Denoising diffusion is known to be equivalent to score-based matching [Song and Ermon (2019) and Song et al. (2021)] where a neural network learns to predict the score ∇_x log p_t(x) of the diffused data. The score-based perspective describes the forward/reverse processes with Stochastic Differential Equations (SDEs) with the Wiener process

Emiel Hoogeboom, Victor Garcia Satorras, Clément Vignac, Max Welling. Equivariant Diffusion for Molecule Generation in 3D. ICML 2022. GitHub

The work introduces an equivariant diffusion model (EDM) for molecule generation that has to maintain E(3) equivariance over atom coordinates x (as to rotation, translation, reflection) and while node features h (such as atom types) remain invariant. Importantly, atoms have different feature modalities: atom charge is an ordinal integer, atom types are one-hot categorical features, and atom coordinates are continuous features, so the authors design feature-specific noising processes and loss functions, and scale input features for training stability.

EDM employs an equivariant E(n) GNN as a neural network that predicts noise based on input features and time step. At inference time, we first sample the desired number of atoms M, then we can condition EDM on a desired property c, and ask EDM to generate molecules (defined by features x and h) as x, h ~ p(x,h | c, M).

Experimentally, EDM outperforms normalizing flow- and VAE-based approaches by a large margin in terms of achieved negative log-likelihood, molecule stability, and uniqueness. Ablations demonstrate that an equivariant GNN encoder is crucial as replacing it with a standard MPNN leads to significant performance drops.

Diffusion-based generation visualization. Source: Twitter

DiGress: Diffusion for Graph Generation

Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, Pascal Frossard. DiGress: Discrete Denoising diffusion for graph generation. GitHub

DiGress by Clemént Vignac, Igor Krawczuk, and the EPFL team is the unconditional graph generation model (although with the possibility to incorporate a score-based function for conditioning on graph-level features like energy MAE). DiGress is a discrete diffusion model, that is, it operates on discrete node types (like atom types C, N, O) and edge types (like single / double / triple bond) where adding noise to a graph corresponds to multiplication with the transition matrix (from one type to another) mined as marginal probabilities from the training set. The denoising neural net is a modified Graph Transformer. DiGress works for many graph families — planar, SBMs, and molecules, code is available, and check the video from the LoGaG reading group presentation!

DiGress diffusion process. Source: Vignac, Krawczuk, et al.

GeoDiff and Torsional Diffusion: Molecular Conformer Generation

Having a molecule with 3D coordinates of its atoms, conformer generation is the task of generating another set of valid 3D coordinates with which a molecule can exist. Recently, we have seen GeoDiff and Torsional Diffusion that applied the diffusion framework to this task.

Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, Jian Tang. GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation. ICLR 2022. GitHub

GeoDiff is the SE(3)-equivariant diffusion model for generating conformers of given molecules. Diffusion is applied to 3D coordinates that gradually get transformed to Gaussian noise (forward process). The reverse process denoises a random sample to a valid set of atomic coordinates. GeoDiff defines an equivariant diffusion framework in the Euclidean space (that postulates which kind of noise can be added) and applies an equivariant GNN as the denoising model. The denoising GNN, a Graph Field Network, is an extension of rather standard EGNNs. For the first time, GeoDiff showed how much better the diffusion models are compared to normalizing flows and VAE-based models 💪

GeoDiff. Source: Xu et al.

Bowen Jing, Gabriele Corso, Jeffrey Chang, Regina Barzilay, Tommi Jaakkola. Torsional Diffusion for Molecular Conformer Generation. NeurIPS 2022. GitHub

While GeoDiff diffuses 3D coordinates of atoms in the Euclidean space, Torsional Diffusion proposes a neat way to perturb torsion angles in freely rotatable bonds of molecules. Since the number of such rotatable bonds is always much smaller than the number of atoms (on average in GEOM-DRUGS, 44 atoms vs 8 torsion angles per molecule), generation can potentially be much faster. The tricky part is that torsion angles do not form a Euclidean space, but rather a hypertorus (a donut 🍩), so adding some Gaussian noise to coordinates won’t work — instead, the authors design a novel perturbation kernel as the wrapped normal distribution (from real space but modulated by 2pi). Torsional Diffusion applies the score-based perspective to training and generation where the score model has to be SE(3)-invariant and sign-equivariant. The score model is a variation of the Tensor Field Network.

Experimentally, Torsional Diffusion indeed works faster — it only needs 5–20 steps compared to 5000 steps of GeoDiff, and is currently a SOTA in conformer generation 🚀

Torsional Diffusion. Source: Jing, Corso, et al.

DiffDock: Diffusion for Molecular Docking

Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. GitHub

DiffDock is the score-based generative model for molecular docking, eg, given a ligand and a protein, predicting how a ligand binds to a target protein. DiffDock runs the diffusion process over translations T(3), rotations SO(3), and torsion angles SO(2)^m in the product space: (1) positioning of the ligand wrt the protein (often called binding pockets), the pocket is unknown in advance so it is blind docking, (2) defining rotational orientation of the ligand, and (3) defining torsion angles of the conformation (see the Torsional Diffusion above for reference).

DiffDock trains 2 models: the score model for predicting actual coordinates and the confidence model for estimating the likelihood of the generated prediction. Both models are SE(3)-equivariant networks over point clouds, but the bigger score model (in terms of parameter count) works on protein residues from alpha-carbons (initialized from the now-famous ESM2 protein LM) while the confidence model uses the fine-grained atom representations. Initial ligand structures are generated by RDKit. DiffDock dramatically improves the prediction quality and you can even upload your own proteins (PDB) and ligands (SMILES) in the online demo on HuggingFace spaces to test it out!

DiffDock intuition. Source: Corso, Stärk, Jing, et al.

DiffSBDD: Diffusion for Generating Novel Ligands

Arne Schneuing, Yuanqi Du, Charles Harris, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, Michael Bronstein, Bruno Correia. Structure-based Drug Design with Equivariant Diffusion Models. GitHub

DiffSBDD is the diffusion model for generating novel ligands conditioned on the protein pocket. DiffSBDD can be implemented with 2 approaches: (1) pocket-conditioned ligand generation when the pocket is fixed; (2) inpainting-like generation that approximates the joint distribution of pocket-ligand pairs. In both approaches, DiffSBDD relies on the tuned equivariant diffusion model (EDM, ICML 2022) and equivariant EGNN as the denoising model. Practically, ligands and proteins are represented as point clouds with categorical features and 3D coordinates (proteins can be alpha-carbon residues or full atoms, one-hot encoding of residues — ESM2 could be used here in future), so diffusion is performed over the 3D coordinates ensuring equivariance.

DiffSBDD. Source: Schneuing, Du, et al.

DiffLinker: Diffusion for Generating Molecular Linkers

Ilya Igashov, Hannes Stärk, Clément Vignac, Victor Garcia Satorras, Pascal Frossard, Max Welling, Michael Bronstein, Bruno Correia. Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design. GitHub

DiffLinker is the diffusion model for generating molecular linkers conditioned on 3D fragments. While previous models are autoregressive (hence not permutation equivariant) and can only link 2 fragments, DiffLinker generates the whole structure and can link 2+ fragments. In DiffLinker, each point cloud is conditioned on the context (all other known fragments and/or protein pocket), the context is usually fixed. The diffusion framework is similar to EDM but is now conditioned on the 3D data rather than on scalars. The denoising model is the same equivariant EGNN. Interestingly, DiffLinker has an additional module to predict the linker size (number of molecules) so you don’t have to specify it beforehand.

DiffLinker. Source: Igashov et al.

Learn More

  • SMCDiff for generating protein scaffolds conditioned on the desired motif (also with EGNN).
  • Generally, in graph and molecule generation we’d like to support some discreteness, so any improvements to the discrete diffusion are very welcome, eg, Richemond, Dieleman, and Doucet propose a new simplex diffusion for categorical data with the Cox-Ingersoll-Ross SDE (rare find!).
  • Discrete diffusion is also studied for text generation in the recent DiffusER.
  • Hugging Face maintains the 🧨 Diffusers library starts the open course on Diffusion Models — check them out for practical implementation tips
  • Check the recordings of the CVPR 2022 tutorial on diffusion models by Karsten Kreis, Ruiqi Gao, and Arash Vahdat

We’ll spare your browser tabs for now 📚 but do expect more diffusion models in Geometric DL!

A special thanks goes to Hannes Stärk and Ladislav Rampášek for proofreading the post! Follow Hannes, Ladislav, and me on Twitter, or subscribe to the GraphML channel in Telegram.

Molecule generation. Generated with Stable Diffusion 2

--

--

AI Research Scientist @ Intel Labs. Working on Graph ML, Geometric DL, and Knowledge Graphs