Curc: a CUDA-based reference-free read compressor

Shaohui Xie; Xiaotian He; Shan He; Zexuan Zhu

doi:10.1093/bioinformatics/btac333

Curc: a CUDA-based reference-free read compressor

Shaohui Xie, Xiaotian He, Shan He, Zexuan Zhu

Computer Science

Research output: Contribution to journal › Article › peer-review

Abstract

MOTIVATION: The data deluge of high-throughput sequencing (HTS) has posed great challenges to data storage and transfer. Many specific compression tools have been developed to solve this problem. However, most of the existing compressors are based on central processing unit (CPU) platform, which might be inefficient and expensive to handle large-scale HTS data. With the popularization of graphics processing units (GPUs), GPU-compatible sequencing data compressors become desirable to exploit the computing power of GPUs.

RESULTS: We present a GPU-accelerated reference-free read compressor, namely CURC, for FASTQ files. Under a GPU-CPU heterogeneous parallel scheme, CURC implements highly efficient lossless compression of DNA stream based on the pseudogenome approach and CUDA library. CURC achieves 2-6-fold speedup of the compression with competitive compression rate, compared with other state-of-the-art reference-free read compressors.

AVAILABILITY AND IMPLEMENTATION: CURC can be downloaded from https://github.com/BioinfoSZU/CURC.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Original language	English
Pages (from-to)	3294-3296
Number of pages	3
Journal	Bioinformatics
Volume	38
Issue number	12
Early online date	17 May 2022
DOIs	https://doi.org/10.1093/bioinformatics/btac333
Publication status	Published - Jun 2022

Bibliographical note

Keywords

Sequence Analysis, DNA
Algorithms
Data Compression
High-Throughput Nucleotide Sequencing
Gene Library

Access to Document

10.1093/bioinformatics/btac333Licence: None: All rights reserved

Cite this

@article{092c4fc4bf3545af8ce95d8a8a8ff7f5,

title = "Curc: a CUDA-based reference-free read compressor",

abstract = "MOTIVATION: The data deluge of high-throughput sequencing (HTS) has posed great challenges to data storage and transfer. Many specific compression tools have been developed to solve this problem. However, most of the existing compressors are based on central processing unit (CPU) platform, which might be inefficient and expensive to handle large-scale HTS data. With the popularization of graphics processing units (GPUs), GPU-compatible sequencing data compressors become desirable to exploit the computing power of GPUs.RESULTS: We present a GPU-accelerated reference-free read compressor, namely CURC, for FASTQ files. Under a GPU-CPU heterogeneous parallel scheme, CURC implements highly efficient lossless compression of DNA stream based on the pseudogenome approach and CUDA library. CURC achieves 2-6-fold speedup of the compression with competitive compression rate, compared with other state-of-the-art reference-free read compressors.AVAILABILITY AND IMPLEMENTATION: CURC can be downloaded from https://github.com/BioinfoSZU/CURC.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.",

keywords = "Sequence Analysis, DNA, Algorithms, Data Compression, High-Throughput Nucleotide Sequencing, Gene Library",

author = "Shaohui Xie and Xiaotian He and Shan He and Zexuan Zhu",

year = "2022",

month = jun,

doi = "10.1093/bioinformatics/btac333",

language = "English",

volume = "38",

pages = "3294--3296",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "12",

}

TY - JOUR

T1 - Curc

T2 - a CUDA-based reference-free read compressor

AU - Xie, Shaohui

AU - He, Xiaotian

AU - He, Shan

AU - Zhu, Zexuan

PY - 2022/6

Y1 - 2022/6

N2 - MOTIVATION: The data deluge of high-throughput sequencing (HTS) has posed great challenges to data storage and transfer. Many specific compression tools have been developed to solve this problem. However, most of the existing compressors are based on central processing unit (CPU) platform, which might be inefficient and expensive to handle large-scale HTS data. With the popularization of graphics processing units (GPUs), GPU-compatible sequencing data compressors become desirable to exploit the computing power of GPUs.RESULTS: We present a GPU-accelerated reference-free read compressor, namely CURC, for FASTQ files. Under a GPU-CPU heterogeneous parallel scheme, CURC implements highly efficient lossless compression of DNA stream based on the pseudogenome approach and CUDA library. CURC achieves 2-6-fold speedup of the compression with competitive compression rate, compared with other state-of-the-art reference-free read compressors.AVAILABILITY AND IMPLEMENTATION: CURC can be downloaded from https://github.com/BioinfoSZU/CURC.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

AB - MOTIVATION: The data deluge of high-throughput sequencing (HTS) has posed great challenges to data storage and transfer. Many specific compression tools have been developed to solve this problem. However, most of the existing compressors are based on central processing unit (CPU) platform, which might be inefficient and expensive to handle large-scale HTS data. With the popularization of graphics processing units (GPUs), GPU-compatible sequencing data compressors become desirable to exploit the computing power of GPUs.RESULTS: We present a GPU-accelerated reference-free read compressor, namely CURC, for FASTQ files. Under a GPU-CPU heterogeneous parallel scheme, CURC implements highly efficient lossless compression of DNA stream based on the pseudogenome approach and CUDA library. CURC achieves 2-6-fold speedup of the compression with competitive compression rate, compared with other state-of-the-art reference-free read compressors.AVAILABILITY AND IMPLEMENTATION: CURC can be downloaded from https://github.com/BioinfoSZU/CURC.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

KW - Sequence Analysis, DNA

KW - Algorithms

KW - Data Compression

KW - High-Throughput Nucleotide Sequencing

KW - Gene Library

U2 - 10.1093/bioinformatics/btac333

DO - 10.1093/bioinformatics/btac333

M3 - Article

C2 - 35579371

SN - 1367-4803

VL - 38

SP - 3294

EP - 3296

JO - Bioinformatics

JF - Bioinformatics

IS - 12

ER -

Curc: a CUDA-based reference-free read compressor

Abstract

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this