Odintifier - A computational method for identifying insertions of organellar origin from modern and ancient high-throughput sequencing data based on haplotype phasing

Research output: Contribution to journalArticle

Standard

Odintifier - A computational method for identifying insertions of organellar origin from modern and ancient high-throughput sequencing data based on haplotype phasing. / Samaniego Castruita, Jose Alfredo; Zepeda Mendoza, Marie Lisandra; Barnett, Ross; Wales, Nathan; Gilbert, M. Thomas P.

In: BMC Bioinformatics, Vol. 16, No. 1, 232, 28.07.2015.

Research output: Contribution to journalArticle

Harvard

APA

Vancouver

Author

Bibtex

@article{5e6afc09d2cd4a03892571800306d8d4,
title = "Odintifier - A computational method for identifying insertions of organellar origin from modern and ancient high-throughput sequencing data based on haplotype phasing",
abstract = "Background: Cellular organelles with genomes of their own (e.g. plastids and mitochondria) can pass genetic sequences to other organellar genomes within the cell in many species across the eukaryote phylogeny. The extent of the occurrence of these organellar-derived inserted sequences (odins) is still unknown, but if not accounted for in genomic and phylogenetic studies, they can be a source of error. However, if correctly identified, these inserted sequences can be used for evolutionary and comparative genomic studies. Although such insertions can be detected using various laboratory and bioinformatic strategies, there is currently no straightforward way to apply them as a standard organellar genome assembly on next-generation sequencing data. Furthermore, most current methods for identification of such insertions are unsuitable for use on non-model organisms or ancient DNA datasets. Results: We present a bioinformatic method that uses phasing algorithms to reconstruct both source and inserted organelle sequences. The method was tested in different shotgun and organellar-enriched DNA high-throughput sequencing (HTS) datasets from ancient and modern samples. Specifically, we used datasets from lions (Panthera leo ssp. and Panthera leo leo) to characterize insertions from mitochondrial origin, and from common grapevine (Vitis vinifera) and bugle (Ajuga reptans) to characterize insertions derived from plastid genomes. Comparison of the results against other available organelle genome assembly methods demonstrated that our new method provides an improvement in the sequence assembly. Conclusion: Using datasets from a wide range of species and different levels of complexity we showed that our novel bioinformatic method based on phasing algorithms can be used to achieve the next two goals: i) reference-guided assembly of chloroplast/mitochondrial genomes from HTS data and ii) identification and simultaneous assembly of odins. This method represents the first application of haplotype phasing for automatic detection of odins and reference-based organellar genome assembly.",
keywords = "Ancient DNA, High-throughput sequencing, Mitochondrial assembly, Numt, Nupt, Odin, Phasing",
author = "{Samaniego Castruita}, {Jose Alfredo} and {Zepeda Mendoza}, {Marie Lisandra} and Ross Barnett and Nathan Wales and Gilbert, {M. Thomas P.}",
year = "2015",
month = "7",
day = "28",
doi = "10.1186/s12859-015-0682-1",
language = "English",
volume = "16",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "Springer",
number = "1",

}

RIS

TY - JOUR

T1 - Odintifier - A computational method for identifying insertions of organellar origin from modern and ancient high-throughput sequencing data based on haplotype phasing

AU - Samaniego Castruita, Jose Alfredo

AU - Zepeda Mendoza, Marie Lisandra

AU - Barnett, Ross

AU - Wales, Nathan

AU - Gilbert, M. Thomas P.

PY - 2015/7/28

Y1 - 2015/7/28

N2 - Background: Cellular organelles with genomes of their own (e.g. plastids and mitochondria) can pass genetic sequences to other organellar genomes within the cell in many species across the eukaryote phylogeny. The extent of the occurrence of these organellar-derived inserted sequences (odins) is still unknown, but if not accounted for in genomic and phylogenetic studies, they can be a source of error. However, if correctly identified, these inserted sequences can be used for evolutionary and comparative genomic studies. Although such insertions can be detected using various laboratory and bioinformatic strategies, there is currently no straightforward way to apply them as a standard organellar genome assembly on next-generation sequencing data. Furthermore, most current methods for identification of such insertions are unsuitable for use on non-model organisms or ancient DNA datasets. Results: We present a bioinformatic method that uses phasing algorithms to reconstruct both source and inserted organelle sequences. The method was tested in different shotgun and organellar-enriched DNA high-throughput sequencing (HTS) datasets from ancient and modern samples. Specifically, we used datasets from lions (Panthera leo ssp. and Panthera leo leo) to characterize insertions from mitochondrial origin, and from common grapevine (Vitis vinifera) and bugle (Ajuga reptans) to characterize insertions derived from plastid genomes. Comparison of the results against other available organelle genome assembly methods demonstrated that our new method provides an improvement in the sequence assembly. Conclusion: Using datasets from a wide range of species and different levels of complexity we showed that our novel bioinformatic method based on phasing algorithms can be used to achieve the next two goals: i) reference-guided assembly of chloroplast/mitochondrial genomes from HTS data and ii) identification and simultaneous assembly of odins. This method represents the first application of haplotype phasing for automatic detection of odins and reference-based organellar genome assembly.

AB - Background: Cellular organelles with genomes of their own (e.g. plastids and mitochondria) can pass genetic sequences to other organellar genomes within the cell in many species across the eukaryote phylogeny. The extent of the occurrence of these organellar-derived inserted sequences (odins) is still unknown, but if not accounted for in genomic and phylogenetic studies, they can be a source of error. However, if correctly identified, these inserted sequences can be used for evolutionary and comparative genomic studies. Although such insertions can be detected using various laboratory and bioinformatic strategies, there is currently no straightforward way to apply them as a standard organellar genome assembly on next-generation sequencing data. Furthermore, most current methods for identification of such insertions are unsuitable for use on non-model organisms or ancient DNA datasets. Results: We present a bioinformatic method that uses phasing algorithms to reconstruct both source and inserted organelle sequences. The method was tested in different shotgun and organellar-enriched DNA high-throughput sequencing (HTS) datasets from ancient and modern samples. Specifically, we used datasets from lions (Panthera leo ssp. and Panthera leo leo) to characterize insertions from mitochondrial origin, and from common grapevine (Vitis vinifera) and bugle (Ajuga reptans) to characterize insertions derived from plastid genomes. Comparison of the results against other available organelle genome assembly methods demonstrated that our new method provides an improvement in the sequence assembly. Conclusion: Using datasets from a wide range of species and different levels of complexity we showed that our novel bioinformatic method based on phasing algorithms can be used to achieve the next two goals: i) reference-guided assembly of chloroplast/mitochondrial genomes from HTS data and ii) identification and simultaneous assembly of odins. This method represents the first application of haplotype phasing for automatic detection of odins and reference-based organellar genome assembly.

KW - Ancient DNA

KW - High-throughput sequencing

KW - Mitochondrial assembly

KW - Numt

KW - Nupt

KW - Odin

KW - Phasing

UR - http://www.scopus.com/inward/record.url?scp=84937940616&partnerID=8YFLogxK

U2 - 10.1186/s12859-015-0682-1

DO - 10.1186/s12859-015-0682-1

M3 - Article

C2 - 26216337

AN - SCOPUS:84937940616

VL - 16

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 232

ER -