Motivation: The introduction of artificial intelligence (AI) methods in the field of structural bioinformatics has represented a true revolution in the prediction of three-dimensional structures of proteins. In particular, the 14th Community-wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP14) held in 2020 has shown that deep learning methods (represented by AlphaFold [1]) are a watershed that has totally changed the approach to dealing with these kinds of biological problems [2,3]. Since then, other deep learning-based approaches have been developed, such as RoseTTAFold [4]; additionally, AlphaFold has allowed to apply an unprecedented effort to the prediction of the 3D structures of the human proteome [5], and all these models are now freely available in a dedicated database hosted at https://alphafold.ebi.ac.uk/. Therefore, great expectations are now present for other envisaged applications, such as the structure-based drug design [6]. However, for this kind of approach, it is important to predict structures with great accuracy, and, looking at the results produced by AlphaFold in the CASP14 competition, it seems that there is still room for improvement. A traditional approach for protein structure prediction by comparative modelling is the one developed by Sali and Blundell, based on the satisfaction of spatial restraints [7] and implemented in the well-known and widely used program MODELLER [8]. We wanted to test whether the coupling of these two approaches could improve the final models over those obtained with a single approach, and whether there were particular structural categories of proteins that could benefit from this coupling. Methods: Our methodology uses a combination of AlphaFold [1] and MODELLER [8] to predict protein structures. The experiments were conducted on 46 structures from the CASP14 target list (https://predictioncenter.org/casp14/targetlist.cgi). These structures were compared using SSAP algorithm [9], in order to classify them into 10 structural classes on the basis of their secondary structure content. Zemla’s Global Distance Test (GDT) is used to calculate the quality assessment of the predicted structures. The GDT is a test that measures the similarity between two protein structures with known amino acid correspondences [10]. Additionally, the GDT score was used to test different arrangements between AlphaFold and MODELLER in order to identify the categories of proteins that would most benefit from our pipeline. This also helped to identify different combinations of the tools in a pipeline and their single predictions that could be evaluated. Results: We modelled and evaluated hundreds of new protein structures by using AlphaFold and MODELLER in combination with different parameters. Our preliminary results show that AlphaFold and MODELLER tend to produce different results when used together, and in several cases this combination improves the GDT score for specific protein classes. In particular, for 23 CASP targets, the new pipeline gave an improvement in their GDT score with respect to the original score obtained in CASP14 with AlphaFold alone, for 15 we obtained a decrease in the score, and 8 had no change. The average GDT score for all 46 proteins was 82.65, while the median score was 90.05, with a standard deviation of 18.21. The biggest improvement was found on structure T1047s2-D1 with an increased GDT-TS score of 4.82. However, a relevant GDT score decrease was found on the structure T1036s1-D1 with a drop of -8.29. The results were also evaluated on the basis of the classification of the structures of the protein targets made using SSAP. In detail, targets included in 5 of these categories showed an overall improvement, with an average GTD score of 86.5. Moreover, a relative magnitude analysis was applied between the group of proteins that showed an improvement and the group with a decrease in GDT score. The results showed that the magnitude of improvement on the GDT score of the proteins obtained is twice as great as the decrease protein group. The results obtained show that improvement can be still achieved. This data might also suggest what strategies to pursue to improve AI-based algorithms for protein structure prediction in the future as well. Supplementary Information References: 1. Jumper J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–9. 2. Callaway E. ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 2020, 588, 203-4. 3. Service RF. ‘The game has changed’. AI triumphs at solving protein structures. Science 2020. doi: 10.1126/science.abf9367 4. Baek M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871-6. 5. Tunyasuvunakool K et al. Highly accurate protein structure prediction for the human proteome. Nature 2021, 596, 590-6. 6. Tong AB et al. Could AlphaFold revolutionize chemical therapeutics? Nature Struct Mol Biol 2021, 28, 771-2. 7. Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993, 234, 779-815. 8. Webb B, Sali A. Comparative protein structure modeling using Modeller. Curr Prot Bioinform, 2016, 54, 5.6.1-5.6.37. 9. Orengo C, Taylor WR. SSAP: Sequential structure alignment program for protein structure comparison. Methods Enzymol 1996, 266, 617-35. 10. Zemla A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 2003, 31, 3370-4.

We can make it better – improving the performances of AI-based predictors of protein structures

D'Arminio N;Gil Zuluaga FH;Bardozzo F;Marabotti A
;
Tagliaferri R
2022-01-01

Abstract

Motivation: The introduction of artificial intelligence (AI) methods in the field of structural bioinformatics has represented a true revolution in the prediction of three-dimensional structures of proteins. In particular, the 14th Community-wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP14) held in 2020 has shown that deep learning methods (represented by AlphaFold [1]) are a watershed that has totally changed the approach to dealing with these kinds of biological problems [2,3]. Since then, other deep learning-based approaches have been developed, such as RoseTTAFold [4]; additionally, AlphaFold has allowed to apply an unprecedented effort to the prediction of the 3D structures of the human proteome [5], and all these models are now freely available in a dedicated database hosted at https://alphafold.ebi.ac.uk/. Therefore, great expectations are now present for other envisaged applications, such as the structure-based drug design [6]. However, for this kind of approach, it is important to predict structures with great accuracy, and, looking at the results produced by AlphaFold in the CASP14 competition, it seems that there is still room for improvement. A traditional approach for protein structure prediction by comparative modelling is the one developed by Sali and Blundell, based on the satisfaction of spatial restraints [7] and implemented in the well-known and widely used program MODELLER [8]. We wanted to test whether the coupling of these two approaches could improve the final models over those obtained with a single approach, and whether there were particular structural categories of proteins that could benefit from this coupling. Methods: Our methodology uses a combination of AlphaFold [1] and MODELLER [8] to predict protein structures. The experiments were conducted on 46 structures from the CASP14 target list (https://predictioncenter.org/casp14/targetlist.cgi). These structures were compared using SSAP algorithm [9], in order to classify them into 10 structural classes on the basis of their secondary structure content. Zemla’s Global Distance Test (GDT) is used to calculate the quality assessment of the predicted structures. The GDT is a test that measures the similarity between two protein structures with known amino acid correspondences [10]. Additionally, the GDT score was used to test different arrangements between AlphaFold and MODELLER in order to identify the categories of proteins that would most benefit from our pipeline. This also helped to identify different combinations of the tools in a pipeline and their single predictions that could be evaluated. Results: We modelled and evaluated hundreds of new protein structures by using AlphaFold and MODELLER in combination with different parameters. Our preliminary results show that AlphaFold and MODELLER tend to produce different results when used together, and in several cases this combination improves the GDT score for specific protein classes. In particular, for 23 CASP targets, the new pipeline gave an improvement in their GDT score with respect to the original score obtained in CASP14 with AlphaFold alone, for 15 we obtained a decrease in the score, and 8 had no change. The average GDT score for all 46 proteins was 82.65, while the median score was 90.05, with a standard deviation of 18.21. The biggest improvement was found on structure T1047s2-D1 with an increased GDT-TS score of 4.82. However, a relevant GDT score decrease was found on the structure T1036s1-D1 with a drop of -8.29. The results were also evaluated on the basis of the classification of the structures of the protein targets made using SSAP. In detail, targets included in 5 of these categories showed an overall improvement, with an average GTD score of 86.5. Moreover, a relative magnitude analysis was applied between the group of proteins that showed an improvement and the group with a decrease in GDT score. The results showed that the magnitude of improvement on the GDT score of the proteins obtained is twice as great as the decrease protein group. The results obtained show that improvement can be still achieved. This data might also suggest what strategies to pursue to improve AI-based algorithms for protein structure prediction in the future as well. Supplementary Information References: 1. Jumper J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–9. 2. Callaway E. ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 2020, 588, 203-4. 3. Service RF. ‘The game has changed’. AI triumphs at solving protein structures. Science 2020. doi: 10.1126/science.abf9367 4. Baek M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871-6. 5. Tunyasuvunakool K et al. Highly accurate protein structure prediction for the human proteome. Nature 2021, 596, 590-6. 6. Tong AB et al. Could AlphaFold revolutionize chemical therapeutics? Nature Struct Mol Biol 2021, 28, 771-2. 7. Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993, 234, 779-815. 8. Webb B, Sali A. Comparative protein structure modeling using Modeller. Curr Prot Bioinform, 2016, 54, 5.6.1-5.6.37. 9. Orengo C, Taylor WR. SSAP: Sequential structure alignment program for protein structure comparison. Methods Enzymol 1996, 266, 617-35. 10. Zemla A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res 2003, 31, 3370-4.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4812200
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact