Multiview learning in biomedical applications

Serra, A.; Galdi, P.; Tagliaferri, R.

doi:10.1016/B978-0-323-96104-2.00010-5

Motivation: In the era of big data, the richness and variety of available datasets have opened new horizons for investigators in the biomedical field. The ultimate challenge consists in building an integrated base of knowledge derived from heterogeneous sources. Multiview learning is the branch of machine learning concerned with the analysis of multimodal data, i.e., patterns represented by different sets of features extracted from multiple data sources. In recent years, multiview learning methodologies have become increasingly popular, and a high number of biomedical applications based on multiview data have been recorded in the literature. For example, in bioinformatics, analyses can be based on multiple experiments investigating different facets of the same phenomena, such as gene expression, microRNA expression, protein-protein interactions, genome-wide association, and so on, to capture information regarding different aspects of biological systems. In the same way, neuroscience data analysis can benefit from different imaging modalities that allow to study different features of the nervous system (e.g., structural vs functional organization). Compared to the limited perspective offered by single-view analyses, the integration of multiple views can provide a deeper understanding of the underlying principles governing complex systems. Results: In this work, we review the existing multiview methodologies to discuss their operation modes and principles, with the goal of increasing their further development in the biomedical field. We organized the described methods in three categories, according to the type of data, the statistical problem, and the type of integration. This discussion, which highlights the advantages and disadvantages of different schools of thought, is intended to be a reference for those who want to start working with the integration of biomedical data. We selected a number of representative examples in bioinformatics and neuroinformatics to show the potential of multiview learning applications for cutting-edge research problems. First, we explain how multiview clustering can be used to perform patient subtyping to identify groups of patients that share similar molecular characteristics and possibly similar reactions to treatment. Then, the drug-repositioning problem is introduced and a discussion of the multiview classification methods used in the literature is provided. We then describe an example of how both clustering and classification can be combined in a multiview setting for the automated diagnosis of neurodegenerative disorders and we explain how multiple noninvasive imaging modalities can be exploited together to obtain more accurate brain parcellations. We additionally introduce the emerging fields of single-cell multiomics data analysis and brain imaging genomics. Finally, we discuss how deep learning techniques, which are getting more and more recognition in various fields, can be applied to multimodal data to learn complex representations, and we present a few examples of application.