Over the past few years the amount of digital memory and network traffic used by sequenced biological data has increased dramatically. Genomic projects such as HapMap, 1000 Genomes, etc., have come to the collection and description of genomes of 2,504 individuals from 26 populations, and they contributed to exponential growth of databases of this type and to the development of increasingly efficient technologies. Thanks to the large-scale sequencing of samples of DNA, the interest and the new research in these areas by the scientific community are suddenly grown. In a very short time researchers have developed hardware tools, analysis software, algorithms, private databases and infrastructures to support genomics. In this paper we analyse different approaches for compressing digital files generated by Next-Generation Sequencing tools containing nucleotide sequences and we discuss and evaluate the compression performance of generic compression tools such as gzip and bzip2 by confronting them with a specific system that was designed specifically for genomic file compression: quip.

Next Generation Sequencing Data and its Compression

Carpentieri B.
2019-01-01

Abstract

Over the past few years the amount of digital memory and network traffic used by sequenced biological data has increased dramatically. Genomic projects such as HapMap, 1000 Genomes, etc., have come to the collection and description of genomes of 2,504 individuals from 26 populations, and they contributed to exponential growth of databases of this type and to the development of increasingly efficient technologies. Thanks to the large-scale sequencing of samples of DNA, the interest and the new research in these areas by the scientific community are suddenly grown. In a very short time researchers have developed hardware tools, analysis software, algorithms, private databases and infrastructures to support genomics. In this paper we analyse different approaches for compressing digital files generated by Next-Generation Sequencing tools containing nucleotide sequences and we discuss and evaluate the compression performance of generic compression tools such as gzip and bzip2 by confronting them with a specific system that was designed specifically for genomic file compression: quip.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4758581
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact