DeepMind Alpha Fold solves the mystery of protein folding

science_math · 조회수 748 · 2020.12.10


-Genes contain information that constitutes and maintains the cells of an organism, and information necessary for the cells to form organic relationships.

-A gene has a structure in which 4 types of nucleotides are connected in a long sequence in various orders, and the 4 types of nucleotides are commonly known adenine (A), guanine (G), cytosine (C), thymine (T)

-Protein is a high-molecular organic substance that makes up the body of living organisms and is responsible for cellular activities, and is composed of several small molecules called amino acids connected to each other

-Proteins and genes are a form in which several types of amino acids are connected in a long sequence in various orders, and the order in which the four types of bases are arranged specifies the order in which amino acids are arranged

-In other words, a gene is information about the sequence number of amino acids that make up one protein, and a specific protein is created by connecting amino acids according to the design drawing written on the gene.

2. Protein form

-The human body is made up of over 100,000 types of proteins. Every cell, tissue, and organ we have is made up of proteins made of 20 amino acids in a unique sequence and combination.

-Proteins are large, complex molecules essential for life support. Almost every function our body performs, contracting muscles, sensing light, or converting food into energy, can be found by tracking how proteins move and change.

-Cells are stacked three-dimensionally to make the body's tissues, while amino acids are connected in a horizontal line to produce proteins. However, because proteins cannot function properly if they are simply connected, they function by folding into an appropriate shape.

-The 3D structure adopted by a protein depends on the type and number of amino acids contained in the protein, and the shape determines the role the protein plays in the human body. Therefore, determining how a specific protein is folded is a very important part of knowing the function of that protein.

3. AlphaFold

(1) CASP (Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction)

-In molecular biology, the structure of a protein determines its function, so scientists trying to cure genetic diseases and develop new drugs have been challenging to predict protein structure.

-DeepMind, a subsidiary of Alphabet (Google), which created the artificial intelligence of Go'Alphago', developed'Alphafold', an AI that has more analytical skills than a human scientist in the field of molecular biology.

-In November, Alphafold participated in CASP, a competition that evaluates the ability to predict protein structure, and showed the accuracy of human scientists in two-thirds of the given tasks.

-X-rays began to be used in the 1950s to predict protein structure, and cryogenic electron microscopes have been used in the last 10 years. However, this requires several years of research and special equipment costing billions of won.

-Predictive research using AI began to be carried out at a rudimentary level in the 1990s, but CASP is an international competition held every two years since 1994 to promote performance improvement by competing against the skills of artificial intelligences.

-Competition performance is determined by how well AI fits the protein structure analyzed in advance by human scientists (how far apart the actual protein structure and the structure predicted by artificial intelligence are), with a score between 0 and 100. use

-In the last tournament in 2018, Alpha Fold won first place in their first appearance with 60 points (60% match). And in this year's competition, more than 100 other AIs scored an average of 75 points, while Alpha Fold scored higher with 90 points. This means that the average error value of the alpha fold is only about 0.1 nanometers.




(2) Research status

-Until now, scientists have attempted to interpret the genetic information of DNA and predict protein structure using computers based on it, but the results have been insignificant despite over 50 years of efforts. The process of creating a three-dimensional structure as the chains of amino acids are folded together are so numerous that it is very difficult to predict only with genetic information.

-Cyrus Levintal, an American chemist in 1969, explained that the number of cases in which a protein can be folded is 10^300, and it will take longer than the age of the universe to figure this out. It is not a size that can be handled by humans unless it is extremely lucky

-Therefore, in the meantime, it was difficult to confirm the three-dimensional structure by emitting X-rays to the protein crystal and detecting the reflected wave, but this method is difficult to make a crystal and it is not easy to predict the structure using X-ray scattering information.

On the other hand, the developers of DeepMind conduct research by inputting about 170,000 protein structure data, such as DNA genetic information and three-dimensional structure information of proteins that have already been confirmed through experiments, into artificial intelligence. This allows AlphaFold to self-identify the correlation between genetic information and protein conformation based on numerous information.



-For this competition, only a few weeks were required to learn Alpha Fold and use 128 TPUv3 cores. This is the equivalent of 100 to 200 GPUs of computing power, and is relatively small compared to most modern models used in machine learning today.

-Alphafold succeeded in predicting the protein structure as soon as it released the genetic information of the coronavirus in China in January 2020. This allows AlphaFold to select a drug that binds well to the coronavirus protein among the existing treatments, even without conducting expensive experiments.

-And recently, Dr. Andrei Lupas of the Max Planck Institute in Germany discovered a specific protein structure that had not been discovered in 10 years, AlphaFold found in half an hour. In the future, Lupas explains that protein structure analysis will be entirely dependent on computers, which can completely change the fate of medicine.

-Scientists expect Alphafold to make a big leap forward in molecular biology. Dr. Janet Thornton of the European Institute of Molecular Biology in the UK, a former CASP judge, explains that AlphaFold's approach will help reveal the functions of numerous proteins in the human genome and help understand the genetic variations that cause diseases that differ from person to person

-Of course, there are still limits. It is known that proteins affected by surrounding proteins, that is, interactions between proteins, are difficult to analyze with alpha fold. DeepMind plans to continue working on improving this, and will soon publish a paper on AlphaFold's ability to predict protein structure.

4. Personal opinion

-To put it very simply, the principle of Alpha Fold Neural Network is to create a three-dimensional structure by calculating the distance between pairs of amino acid residues and the angle between chemical bonds that connect amino acids (see Alpha Fold Link above for more information).

-Detailed information on the structure of neural networks is not yet available, so it will be known until a little more, but since I feel it personally, alpha-fold work seems much easier than the current natural language processing (NLP) area through neural networks

-However, while reading natural protein sequences, large-scale protein sequence data of 180 million units are stored in UniProt (Universal Protein Database), while data used to reproduce the sequence in a three-dimensional structure is PBD (Protein Data Bank). ) Only 170,000

-Therefore, it is only a situation where the input value is insufficient, and the task of creating a neural network itself seems not to be very difficult. Therefore, if only enough input data is made in the future, Alpha Fold is expected to achieve remarkable performance beyond expectations in the bio field.

-And input data is so important, and considering the process of creating data, time, effort, etc., I think that a new business may be created in this regard.

comment 0
The early Earth was like hell with lava going up?
조회수 735 · 2020.12.11