A variety of data analysis concerned with genome sequences support the proposal that each living organism owns a genomic signature related to the occurrence frequencies of special patterns which can be revealed along its DNA string. The more classical approaches to the genomic signature deciphering are based on the counting of subsequences in a long DNA sequence. Such a counting is well suited only for words of length less or equal than four. For longer words a more tractable tool, derived from the “chaotic dynamical system” theory (the so-called Chaos Game Representation, for short CGR), is used with the objective of depicting word frequencies in the form of fractal images for any length n. Using CGR, experimental results have also shown that the core features characterizing the whole genome are preserved in short subsequences, validating the idea of a genomic signature. This paper collects the basic ideas, tools, and methodologies related to the genomic signature concept reflecting the tutorial path necessary to investigate about a possible mathematical representation that maintains the essential properties of the genomic signature and provides a more formal instrument of analysis.
The Genomic Signature: Methods and Computational Techniques for its Reliable Deciphering
DE SANTIS, Filomena
2004
Abstract
A variety of data analysis concerned with genome sequences support the proposal that each living organism owns a genomic signature related to the occurrence frequencies of special patterns which can be revealed along its DNA string. The more classical approaches to the genomic signature deciphering are based on the counting of subsequences in a long DNA sequence. Such a counting is well suited only for words of length less or equal than four. For longer words a more tractable tool, derived from the “chaotic dynamical system” theory (the so-called Chaos Game Representation, for short CGR), is used with the objective of depicting word frequencies in the form of fractal images for any length n. Using CGR, experimental results have also shown that the core features characterizing the whole genome are preserved in short subsequences, validating the idea of a genomic signature. This paper collects the basic ideas, tools, and methodologies related to the genomic signature concept reflecting the tutorial path necessary to investigate about a possible mathematical representation that maintains the essential properties of the genomic signature and provides a more formal instrument of analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.