Genome-guided transcript assembly by integrative analysis of RNA sequence data

Nathan Boley, Marcus H Stoiber, Benjamin W Booth, Kenneth H Wan, Roger A Hoskins, Peter J Bickel, Susan E Celniker, James Brown

Research output: Contribution to journalArticlepeer-review

34 Citations (Scopus)

Abstract

The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in the annotation of genomes. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call Generalized RNA Integration Tool, or GRIT. Applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recovered the vast majority of previously annotated transcripts and doubled the total number of transcripts cataloged. We found that 20% of protein coding genes encode multiple protein-localization signals and that, in 20-d-old adult fly heads, genes with multiple polyadenylation sites are more common than genes with alternative splicing or alternative promoters. GRIT demonstrates 30% higher precision and recall than the most widely used transcript assembly tools. GRIT will facilitate the automated generation of high-quality genome annotations without the need for extensive manual annotation.

Original languageEnglish
Pages (from-to)341-346
Number of pages6
JournalNature Biotechnology
Volume32
Issue number4
Early online date16 Mar 2014
DOIs
Publication statusPublished - Apr 2014

Keywords

  • Animals
  • Chromosome Mapping
  • Drosophila melanogaster
  • Genome, Insect
  • Genomics
  • Molecular Sequence Annotation
  • RNA
  • Sequence Analysis, RNA
  • Journal Article
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

Fingerprint

Dive into the research topics of 'Genome-guided transcript assembly by integrative analysis of RNA sequence data'. Together they form a unique fingerprint.

Cite this