DeepMind artificial intelligence maps 200 million proteins

AlphaFold, the Artificial Intelligence created by the company DeepMind, has managed to analyze and catalog more than 200 million proteins – practically all that exists, according to scientists.

Over the past 50 years, identifying these structures has been one of the greatest challenges in biology. The AI ​​managed to do the job in just 18 months, a time considered a record for science.

The number was revealed by DeepMind this week, announcing the extension of free access to the AlphaFold protein structure database to researchers worldwide.

According to the company, the models developed by AI have already been cited in more than 4,000 scientific studies since its creation in 2020.

The work of these two years was carried out in partnership with the European Institute of Bioinformatics of the European Molecular Biology Laboratory (EMBL-EBI). Information on protein structures has also been published on UniProt, the world-renowned protein research repository.

DeepMind is the artificial intelligence arm of Alphabet, Google’s UK-based parent company.

“Gift to Humanity”

One of the scientists who worked on the project, Ewan Birney of EMBL-EBI, went so far as to say that he considers the AlphaFold database “a gift to mankind”.

“As someone who’s been working in genomics and computational biology since the 1990s, I’ve seen a lot of those times where you can feel the landscape changing beneath you and new features being made available, and that was the one of the fastest. Two years ago, we just didn’t know it was doable,” Birney said, in an interview with the publication. Newsscientist.

With this breakthrough, AlphaFold has revolutionized knowledge in biology and basic sciences, which will allow scientists to better understand the evolution of many diseases and to develop new drugs and products.

According to Newscientist, the database has enabled advances in areas such as the fight against malaria, antibiotic-resistant bacteria and enzymes to break down plastic waste.

Proteins are difficult structures to understand

Proteins are made up of long strands of amino acids. Some of them are attracted to others, others are repelled by water, and the chains will twist and create complex shapes that are difficult to determine with precision. Accessing this tangle is essential to understand many diseases or to fight against certain species of pests.

In the case of malaria, for example, understanding the protein of the insect makes it possible to break its life cycle. This is why AI and the database were such important milestones for science.

In an interview with MIT Technology Review, Hassabis said protein folding is a problem he’s been trying to solve for more than 20 years.

“I would say it’s the biggest thing we’ve done so far. It’s the most exciting in a way, because it has to have the biggest impact on the world outside of intelligence. artificial.”

What are the next steps?

Scientists recognize the importance of AlphaFold but admit that now is the time to take further action and address other protein issues.

For some scientists, AI has its limits and there is a need to improve the precision of the tool, by developing a model that understands how proteins fold and not just their final structure.

In an interview with Newscientist, Keith Willison of Imperial College London said AlphaFold is not able to take an arbitrary sequence of amino acids and model exactly how they fold. There are also complex and messy interactions between proteins, which have no patterns and have not yet been solved by AI.

The tool is only able to use the parts of proteins and their structures that have been determined experimentally to predict how a new protein will fold. In other words, their structures are always predictions and not calculated results.

Pushmeet Kohli, who leads DeepMind’s science team, says the company is working to improve accuracy and capabilities. “We know the static structure of proteins, but that’s not where the game stops,” he acknowledged.

“We want to understand how these proteins behave, what their dynamics are, how they interact with other proteins. Then there’s the other area of ​​genomics where we want to understand how the recipe for life translates into what proteins are created, when they are created and how a cell works,” he explained.

Leave a Comment