Navigating the Explainable Molecular Graph: Best Practices for Representation Learning in Bioinformatics

Zaccagnino, Rocco; Benevento, Gerardo; Laurenzano, Gianpaolo; Malandrino, Delfina; Petescia, Alessia; Zaccagnino, Gianluca

doi:10.1109/BIBE63649.2024.10820484

Machine learning (ML) has shown significant success in real-world scenarios where data is represented in the Euclidean domain. However, in the biomedical field, complex relational information between biological entities is often encapsulated in non-Euclidean structures, such as biomedical graphs, which are difficult to learn by traditional ML methods. Graph representation learning aims to embed graphs into a low dimensional space while preserving topology and properties. This approach, generally organized into graph embedding techniques and graph neural networks (GNNs), bridges the gap between complex biomedical graphs and modern ML methods. Recently, it has garnered widespread interest as it offers a powerful framework for leveraging relational information inherent in biomedical data. In this context, it becomes challenging to navigate the complexities of graph-based biological problems, since the intricate relational data and the challenge of preserving graph structure during embedding pose substantial obstacles. The goal of this paper is to clarify which of these two main approaches—graph embedding techniques or GNNs—is more suitable for one of the most successful applications of graph representation learning: molecular property prediction. Molecules contain many types of substructures that may affect their properties, and recognizing substructures and relations embedded in a molecular structure representation is crucial for structure activity relationship and structure-property relationship studies. By examining the effectiveness of different graph representation learning techniques in this context, this paper aims to provide valuable insights and guidelines for researchers and practitioners. To this aim, we carried out experiments on 4 well-known benchmarking datasets for molecular property prediction tasks, showing that GNNs are more effective. We also developed a platform for experiments in molecular property prediction with GNNs, which integrates an attention mechanism to highlight the atoms within a molecule that most significantly impact its biological function. The result of this preliminary study is not only the demonstration of the advantages of GNNs in terms of effectiveness for these tasks but also the validation of their suitability for an “explainable” AI approach. This advancement makes GNNs a powerful tool in the realm of molecular machine learning, facilitating both accurate predictions and enhanced understanding of the underlying molecular mechanisms.

UniSa - IRIS Institutional Research Information System