High-throughput DNA sequence data compression

Zexuan Zhu, Yongpeng Zhang, Zhen Ji, Shan He, Xiao Yang

Research output: Contribution to journalArticlepeer-review

50 Citations (Scopus)
178 Downloads (Pure)

Abstract

The exponential growth of high-throughput DNA sequence data has posed great challenges to genomic data storage, retrieval and transmission. Compression is a critical tool to address these challenges, where many methods have been developed to reduce the storage size of the genomes and sequencing data (reads, quality scores and metadata). However, genomic data are being generated faster than they could be meaningfully analyzed, leaving a large scope for developing novel compression algorithms that could directly facilitate data analysis beyond data transfer and storage. In this article, we categorize and provide a comprehensive review of the existing compression methods specialized for genomic data and present experimental results on compression ratio, memory usage, time for compression and decompression. We further present the remaining challenges and potential directions for future research.
Original languageEnglish
Pages (from-to)1-15
JournalBriefings in Bioinformatics
Volume16
Issue number1
Early online date3 Dec 2013
DOIs
Publication statusPublished - 2015

Fingerprint

Dive into the research topics of 'High-throughput DNA sequence data compression'. Together they form a unique fingerprint.

Cite this