Discovery of disease sub-types is one of the fundamental problem in clinical applications. This is usually accomplished by grouping patients based on gene expression data. However, microarray data sampling is terribly noisy, and this undermines the possibility to reach scientific consensus on the empirical evidence. In this work we discuss the need of robust data analysis methods for gene expression data. We introduce and discuss recent proposals of clustering methods and algorithms that can handle noise effectively, and that can scale scale with the typical dimension of microarray data. The methods and algorithms are tested on a selection of data sets obtained from the well known “The Cancer Genome Atlas” repository.
Noise resistant clustering of high-dimensional gene expression data
Coretto Pietro
;Angela Serra;Roberto Tagliaferri
2019-01-01
Abstract
Discovery of disease sub-types is one of the fundamental problem in clinical applications. This is usually accomplished by grouping patients based on gene expression data. However, microarray data sampling is terribly noisy, and this undermines the possibility to reach scientific consensus on the empirical evidence. In this work we discuss the need of robust data analysis methods for gene expression data. We introduce and discuss recent proposals of clustering methods and algorithms that can handle noise effectively, and that can scale scale with the typical dimension of microarray data. The methods and algorithms are tested on a selection of data sets obtained from the well known “The Cancer Genome Atlas” repository.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.