The protein folding problem has puzzled scientists and researchers for over 50 years. There seemed to be no solution to this immensely important problem, especially in the last 5 years, until a curveball from AlphaFold, the work of Google’s DeepMind, occurred last month.
To begin with, what is the protein folding problem?
To answer this question, an understanding of the nature of proteins is required. Proteins are long chains of amino acids which have been joined together by hydrogen bonds and other intermolecular forces. The nature of the joining of these amino acid chains (polypeptides) to form proteins has confused scientists for as long as living memory. As polypeptides interact with each other to form specific proteins, the shape of the protein is constructed. The protein folding problem is the mission to solve how to predict the shape of proteins given the polypeptides chains present within that protein.
This may seem useless at first, however, as what importance does knowing the shape of proteins have to humans? Quite a lot. The shape of a protein determines its function. For example, enzymes, which are proteins, have specific shapes on their active sites which determine which reactions they can catalyse and which they cannot. Even a slight change in the polypeptide chains in the enzymes can result in cancers and other unwanted effects. Other common important proteins used by living organisms are in toxins and food. Being able to predict protein shapes from their constituent polypeptide chains, means that doctors can make judgements as to the exact effect of certain proteins on the body and how a change in the polypeptides, would change the function of the protein, and therefore harm or help the body.1
At this point, many would think that there should have been much progress before this year in solving this problem, as it is so important to the advancement of science and healthcare. Unfortunately, that had not occurred. CASP stands for the Critical Assessment of Structure Prediction). Since 1994, when it was set up, contestant companies have tried to produce systems which can predict the structure of random proteins the best. In the scientific community, a system which can predict the structure of proteins more than 90% correctly, has solved the problem. From 2002 to 2016, the predictive power of the best systems plateaued at around 60%, although from 2006 to 2016, the predictive power decreased year on year from around 62% to about 57%. It was during this period many lost faith in the problem being solved soon.
However, this soon changed after CASP 13 in 2018, where AlphaFold, a part of Google’s DeepMind, recorded a predictive power of about 73%, offsetting the plateau reached in previous years. DeepMind is Google’s sub-branch of AI, and AlphaFold uses the advances in AI by DeepMind to help predict the shapes of proteins. After this point, the unthinkable happened. In CASP 14 this year, AlphaFold recorded about a 90% accuracy in predicting proteins, surpassing the threshold required for the scientific community to see this system as a solution to the problem. Before 2018 and even before this year, many biologists and scientists quite understandably were doubtful of the ability of AI to help solve this problem which is so complicated. However, with CASP 13 and 14, the doubters have been proved wrong, indicating just how important AI will be for medicine and science in the future.2
But how has AlphaFold been able to use AI to surpass all other attempts made previously at CASP? AI stands for Artificial Intelligence, and a huge part of Artificial Intelligence is machine learning. This is where the AI is given many different examples of events and therefore can notice patterns in these, making the AI very good after some time at predictions of events given some initial information. In the context of the protein folding problem, the AI was given many proteins and their shapes, and the information about their constituent polypeptide chains. After learning patterns in proteins, the AI was then able to predict the shapes of proteins given their polypeptide chains.
The future for AlphaFold is to fine-tune their system so that it has a predictive power closer to 100%. To do this, the team behind AlphaFold are hoping to make the AI learn better how different atoms are specifically arranged in the overall shape of the protein, improving our understanding of the exact shape of different proteins. There is no doubt about the incredible work of those at AlphaFold, proving how AI can be a force for good in medicine, and largely solving one of the most intriguing problems biology and medicine have had in the last 50 years, the protein folding problem.
- Lavars, N., 2020. Deepmind AI Solves 50-Year Protein Folding Problem In “Stunning Advance”. [online] New Atlas. Available at: <https://newatlas.com/biology/deepmind-ai-50-year-protein-folding-problem/> [Accessed 5 December 2020].
- Zimmer, M., 2020. AI Makes Huge Progress Predicting How Proteins Fold – One Of Biology’s Greatest Challenges – Promising Rapid Drug Development. [online] The Conversation. Available at: <https://theconversation.com/ai-makes-huge-progress-predicting-how-proteins-fold-one-of-biologys-greatest-challenges-promising-rapid-drug-development-151181> [Accessed 5 December 2020].