This webpage was generated programmatically. To view the article in its initial position, you may visit the link below:
https://phys.org/news/2025-01-antibody-accurately.html
If you wish to have this article removed from our site, please get in touch with us
By utilizing artificial intelligence models referred to as large language models, scientists have achieved significant advancements in their capability to predict a protein’s structure based on its sequence. Nevertheless, this method hasn’t proven as effective for antibodies, partly due to the extreme variability inherent in this kind of protein.
To address that constraint, researchers at MIT have innovated a computational strategy that enables large language models to forecast antibody structures with greater precision. Their findings could facilitate researchers in sifting through millions of possible antibodies to pinpoint those that may be applicable for treating SARS-CoV-2 and other infectious illnesses.
The results are published in the journal Proceedings of the National Academy of Sciences.
“Our approach allows us to scale, unlike others, to the extent that we can genuinely discover a few needles in the haystack,” states Bonnie Berger, the Simons Professor of Mathematics, head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), and one of the lead authors of the current study. “If we could assist in preventing pharmaceutical companies from entering clinical trials with incorrect options, it would significantly conserve resources.”
The method, concentrating on modeling the hypervariable segments of antibodies, also presents opportunities for analyzing complete antibody repertoires from individuals. This could be advantageous for exploring the immune responses of individuals who exhibit exceptional responses to diseases like HIV, aiding in understanding why their antibodies effectively combat the virus.
Bryan Bryson, an associate professor of biological engineering at MIT and a member of the Ragon Institute of MGH, MIT, and Harvard, also serves as a senior author of the paper. Rohit Singh, a former CSAIL research scientist who is currently an assistant professor of biostatistics and bioinformatics and cell biology at Duke University, along with recent graduate Chiho Im are the lead authors of the study. Researchers from Sanofi and ETH Zurich also contributed to this investigation.
Modeling hypervariability
Proteins are formed from lengthy chains of amino acids, which can fold into an immense variety of possible structures. In recent times, forecasting these structures has become significantly more accessible, with the aid of artificial intelligence programs like AlphaFold.
Many of these programs, such as ESMFold and OmegaFold, are constructed on large language models, originally designed for analyzing vast amounts of text, allowing them to learn how to predict subsequent words in a sequence. This similar methodology can be applicable for protein sequences—by discerning which protein structures are most likely to emerge from different arrangements of amino acids.
However, this strategy does not consistently yield results for antibodies, particularly a component of the antibody known as the hypervariable region. Antibodies typically exhibit a Y-shaped configuration, with these hypervariable regions situated at the ends of the Y, where they interact with and attach to foreign proteins known as antigens. The base of the Y provides structural stability and aids antibodies in their interactions with immune cells.
Hypervariable regions may fluctuate in length but generally comprise fewer than 40 amino acids. It has been estimated that the human immune system can produce up to 1 quintillion unique antibodies by varying the sequence of these amino acids, ensuring that the body can respond to an extensive range of potential antigens. Those sequences are not evolutionarily limited in the same manner as other protein sequences, making it challenging for large language models to learn how to accurately predict their structures.
.
“One reason language models excel at predicting protein structures is that evolution limits these sequences in ways that the model can interpret those limitations,” Singh states. “It’s akin to grasping the rules of grammar by observing the context of words within a sentence, enabling you to understand its meaning.”
To represent those highly variable regions, the scientists developed two components that enhance current protein language models. One of these components was trained on highly variable sequences from approximately 3,000 antibody structures available in the Protein Data Bank (PDB), allowing it to comprehend which sequences usually produce comparable structures. The other component was trained on data linking around 3,700 antibody sequences to the intensity of their binding with three distinct antigens.
The resultant computational model, named AbMap, is capable of predicting antibody structures and binding affinities based on their amino acid sequences. To showcase the efficacy of this model, the researchers employed it to forecast antibody structures that would effectively neutralize the spike protein of the SARS-CoV-2 virus.
The researchers began with a collection of antibodies anticipated to attach to this target, then generated millions of variations by altering the hypervariable regions. Their model successfully identified antibody structures that would be the most effective, significantly surpassing traditional protein structure models built on large language models in accuracy.
Subsequently, the researchers proceeded to cluster the antibodies into groups with similar structures. They selected antibodies from each of these clusters for experimental testing, collaborating with scientists from Sanofi. These experiments revealed that 82% of these antibodies exhibited superior binding strength compared to the original antibodies incorporated into the model.
Detecting a range of promising candidates early in the development phase could assist pharmaceutical companies in avoiding substantial expenditures on testing candidates that ultimately fail, the researchers assert.
“They prefer not to put all their resources into one option,” Singh explains. “They want to avoid committing to a single antibody for preclinical trials only to discover it has toxic effects. It’s more advantageous to have a selection of strong candidates and progress all of them, ensuring they have alternatives if one fails.”
Comparing antibodies
By utilizing this method, researchers could also endeavor to address longstanding queries regarding why individuals respond differently to infections. For instance, why do some individuals experience considerably more severe manifestations of COVID, and why do certain individuals exposed to HIV remain uninfected?
Scientists have been attempting to tackle these questions through single-cell RNA sequencing of immune cells from different individuals and comparing them—a procedure referred to as antibody repertoire analysis. Prior research has indicated that antibody repertoires from two distinct individuals may overlap as little as 10%. However, sequencing does not provide as thorough a depiction of antibody performance as structural information, given that two antibodies with different sequences may share similar structures and functions.
The new model can assist in resolving that issue by rapidly generating structures for all antibodies present in an individual. In this investigation, the researchers demonstrated that when structure is considered, there is significantly greater overlap among individuals compared to the 10% observed in sequence comparisons. They are now set to explore how these structures may play a role in the overall immune response of the body against a specific pathogen.
“Here is where a language model integrates beautifully, as it offers the scalability of sequence-based analysis while approaching the precision of structure-based analysis,” Singh comments.
Additional information:
Rohit Singh et al, Learning the language of antibody hypervariability, Proceedings of the National Academy of Sciences (2024). DOI: 10.1073/pnas.2418918121
This article is republished with permission from MIT News (web.mit.edu/newsoffice/), a well-known site that reports on MIT research, innovation, and education.
Citation:
New computational model can predict antibody structures more accurately (2025, January 2)
retrieved 2 January 2025
from
This document is subject to copyright. Except for any fair use for the purpose of private study or research, no
portion may be reproduced without the written consent. The content is provided solely for informational purposes.