Compression of next-generation sequencing quality scores using memetic algorithm

Jiarui Zhou, Zhen Ji*, Zexuan Zhu, Shan He

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Background: The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. Results: In this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets. Conclusions: The proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files.

Original languageEnglish
Article number10
JournalBMC Bioinformatics
Volume15
DOIs
Publication statusPublished - 3 Dec 2014

ASJC Scopus subject areas

  • Applied Mathematics
  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Compression of next-generation sequencing quality scores using memetic algorithm'. Together they form a unique fingerprint.

Cite this