Google’s AI subsidiaries DeepMind and Isomorphic Labs are making waves in the scientific community… again. This time, it is with the release of Alphafold 3, a new AI model that predicts molecular structures with unprecedented accuracy and is not just limited to proteins like the successful (and truly game-changer in biology) AlphaFold 2. Indeed, AlphaFold 3 handles proteins and their complexes with DNA, RNA, ligands, ions, and more, promising to again revolutionize our understanding of biology and, in this case, opening up new ways to expedite drug discovery.
- Introduction
- Rich for computer scientists and bioinformaticians too
- Changing how we study Biology, discover drugs, and advance biotechnology
- Using AlphaFold 3 online in a few clicks
- A bright future, but only for a few?
Once more, DeepMind and Isomorphic Labs, two of Google’s core AI subsidiaries, have baffled the scientific world. This time, it’s with the launch of AlphaFold 3, a brand-new AI model that, following the successful (and genuinely revolutionary in biology) AlphaFold 2, predicts molecular structures with even higher accuracy and isn’t only restricted to proteins. In fact, AlphaFold 3 manages proteins and their interactions with ligands, ions, DNA, RNA, and more, and it holds the potential to once more transform our knowledge of biology and, in this instance, it starts to provide new ways to accelerate the development of new drugs – actually threatening the classical ways and software to test in silico how drugs bind to their molecular targets.
Here next is a link to the paper, and read on this blog post for easier explanations of the topics touched on in it, dissected in pieces of interest to computer scientists, bioinformaticians, and biology researchers; and don’t miss the last section reflecting on the fact that this time, the model’s code is closed.
Accurate structure prediction of biomolecular interactions with AlphaFold 3 – Nature
Rich for computer scientists and bioinformaticians too
As in the Nature paper presenting AlphaFold 2 back in July 2021, the new paper reporting AlphaFold 3 is not just a "marvel" for biologists, but also for computer scientists, AI researchers, and bioinformaticians who apply AI technology in their work. Indeed, the paper presents an advanced architecture featuring an enhanced Evoformer module and a diffusion network, which are not totally innovative in the field but are really at its frontier, and moreover, they come with some new tweaks.
Let’s go through the new and improved elements one by one.
First, AlphaFold 3 modifies the architecture from AlphaFold 2 to handle a wider range of chemical structures more efficiently, reducing the need for special-case solutions. It can read protein sequences, just like AlphaFold 2, and also nucleic acid sequences (that is, DNA or RNA bases), some small molecules, and some ions. The latter in particular, require special ways to be handled (as opposed to protein and nucleic acids which can be described simply from their sequences). This intrinsically requires a different architecture and even way of tokenization, as I already advanced here when Deepmind and an academic lab posted blog articles advancing that they were coming up soon with these models:
AlphaFold and Other AI Tools for Molecular Structure Go Beyond Proteins
Continuing with changes in the model’s architecture, AlphaFold 2’s Evoformer was now replaced by a Pairformer, which simplifies the handling of multiple sequence alignments by focusing on pairwise representations. This streamlines the model’s architecture and allows for concurrent treatment of all different kinds of molecular inputs.
AlphaFold 2’s Evoformer was in charge of processing Multiple Sequence Aligments of proteins related to the one being model. This is key in all current AI models for structure prediction, and AlphaFold 3 doesn’t escape from it. However, its Pairformer uses a simpler and smaller embedding block, and fewer blocks overall, that make it all run faster and smoother (which, you guessed it, could potentially miss information… to be tested!).
Besides, like many in the community expected, AlphaFold 3 had to implement diffusion models that can help to accurately position atoms. The application of diffusion models to molecular modeling is not totally new, but has so far mainly be used for designing molecules. AlphaFold 3 instead uses one such module to predict raw atom coordinates directly, a departure from the previous reliance on frames and torsion angles. This in turn eliminates the need for stereochemical losses and special bonding pattern handling – but as the paper itself explains, it isn’t yet perfect and produces hallucinations that end up distorting the shapes of the predicted molecules in obviously impossible ways. AlphaFold 3 is thus somehow more of a generative tool.
One interesting point of the new AlphaFold is that while we always acknowledge that AI models handling molecules must be invariant to translations and rotations, this was thought in a radically new way for version 3. Indeed, this new model does not require invariance or equivariance to global rotations and translations, simplifying the architecture. Rather, the diffusion model is itself trained to denoise atomic coordinates, learning protein structure at various length scales, from local stereochemistry to large-scale structure, not caring about where the atoms are. Somehow, it is like when a human looks at a molecular structure directly: a simplification happens internally naturally, and is possible due to the model’s ability to learn the structure of proteins directly from the raw atomic coordinates.
The procedure for training the model has also been refined, especially for better data efficiency and to allow it to learn effectively from smaller datasets. This was essential because structural information about ions, small molecules, and nucleic acids is far scarcer than structural information about proteins in the Protein Data Bank, the major source of structural data for biological macromolecules. Also to augment the training set, confident structures predicted by AlphaFold-Multimer 2 were used as additional training data.
Last, like any good AI model for structure prediction, AlphaFold 3 also predicts the confidence of its own predicted models. Already since the times of AlphaFold 1, which I evaluated during CASP13, Deepmind took very seriously the task of providing not just 3D models but also confidence metrics. Since AlphaFold 3 handles more than protein structures, it had to be adapted to also produce scores for the non-protein components. For this, AlphaFold 3 uses new error prediction measures at the atom and pairwise levels, based on a diffusion "rollout" during training.
Changing how we study biology, discover drugs, and advance biotechnology
AlphaFold 3’s exceptional precision, especially in predicting ligand and antibody interactions, surpasses physics-based tools. And this is excellent new not just for fundamental biology, but also for pharma and biotech companies, as the program opened up radically new ways to explore problems of direct relevance to them.
For example, and I think this is where all these companies are going, tools like AlphaFold 3 (and others that are coming up soon, like RoseTTAFold-AllAtoms, that I already discussed) are in principle capable of running a procedure called "molecular docking" or "virtual screening" in a totally new way that, as the paper explains, works much better than conventional alternatives. These programs could indeed be made totally obsolete by technologies like AlphaFold 3 or RoseTTAFold-AllAtoms. Basically, these programs are meant to grab the structures of a small molecule and a protein, to then find where the small molecule binds on the protein. But notice that you need to start with structures, which by definition entail a given distribution of atoms in space, but these structures might be very different in the bound and unbound states.
With the new AI methods, the user provides only the protein’s sequence, leaving the task of folding it "concomitantly" with testing binding of the small molecule, thus potentially effectively capturing structural changes required for binding. This is all speculative at the moment, and hasn’t been proven (although my guess is the companies are already trying it out), but my point is that the technology at least allows it… so it might just be a matter of time till we see these new AI models effectively starting to replace the regular docking programs.
Notably along this line, Isomorphic Labs has declared that it is already harnessing AlphaFold 3’s capabilities for drug design, working alone and with pharmaceutical partners to innovate research and development.
Using AlphaFold 3 online in a few clicks
DeepMind has launched the AlphaFold Server, granting the global research community free access to most of AlphaFold 3’s features. This platform is a game-changer, simplifying the modeling of complex molecular structures and allowing biologistss to explore new hypotheses and fast-track discoveries like never before. Indeed, with a few clicks at https://alphafoldserver.com/ (logged in with a Google account, of course) you can quickly input, say, a protein sequence and a ligand (small molecule that presumably binds to the protein) and then model the atomic structure of the complex in seconds. See here for example a model of a protein that binds a group called heme (yes, the one that makes blood red):

As you see in this example, like AlphaFold 2, this new version outputs not only a structure but also confidence metrics. In this case, the whole protein and the heme group are predicted confidently, displaying all blue.
Now let’s see what happens if we ask AlphaFold 3 to model the same protein but with another ligand that doesn’t bind, ATP.

We now get a protein that seems to be folded confidently on the part farther away from the ligand-binding site (deeper blue), and possibly OK but not as reliable everywhere else (light blue). Then when it comes to the ligand, we see it’s all yellow, meaning that the confidence of this docking pose is low. Yet it did place it in the same pocket where the heme group fits in the real complex. My conclusion is that AlphaFold 3 "saw" a pocket and knew it had to put the ligand there, just that it wasn’t sure how. Not bad, although I would have expected lower scores, something like red for the ligand and maybe yellow for the protein.
Just like it happened when AlphaFold 2 was released, researchers are already doing more tests like the will start to do more tests like this and posting them on social networks for all other scientists to follow and contribute. For example, here Sergey Ovchinnikov showed us how to use AlphaFold 3 to automatically "detect" what molecules of a mixture might bind to a protein:
In this other case, Jan Kosinki took a transcription factor (protein that binds DNA) with an unknown structure and folded it with its recognition sequence embedded in longer DNA, to find that AlphaFold 3 could very accurately position the transcription factor onto the DNA. He then did even more interesting tests, such as testing the effects of mutations in the DNA molecule on predicted binding of the transcription factor.
A bright future, but only for a few?
With these advancements, we are on the cusp of unlocking the mysteries of life at a molecular level in ways that we didn’t even dream about just one decade ago. Unfortunately, though, the system is far more closed than all previous versions of AlphaFold: no codes, no weights, a patent filed. And although you can use AlphaFold 3 for free, you have very little control as provided by the GUI, and you are limited to running only around 10 prediction tasks per day.
The scientific community is already complaining about this, and they do have some reason because these AI models couldn’t have been trained if it were not for the hundreds of thousands of structures made available freely by the Protein Data Bank – actually funded by public taxes from a large number of countries. Even one of the reviewers of the paper reporting AlphaFold 3 broke silence on X and explained that his demands to Nature asking for open source compliance were totally dismissed by the journal:
Somehow however this is our (the scientists’) own fault, because we tend to leave all data way too open and, most importantly, without clear clauses on what can or can’t be done with it… After all, the players involved here are companies that do huge investments and then expect huge returns. And that’s exactly what you can get if you can design new and better drugs, faster and cheaper, using tools like this one.
Do leave your comments below on this important aspect – or about any other point you want to discuss!
www.lucianoabriata.com I write about everything that lies in my broad sphere of interests: nature, science, technology, programming, etc. Subscribe to get my new stories by email. To consult about small jobs check my services page here. You can contact me here. You can tip me here.