Sampling variation of RAD-seq data from diploid and tetraploid potato (Solanum tuberosum L.)

Zhenyu  Dang; Jixuan Yang; Lin Wang; Qin  Tao; Fengjun Zhang; Yuxin Zhang; Zewei Luo

doi:10.3390/plants10020319

Sampling variation of RAD-seq data from diploid and tetraploid potato (Solanum tuberosum L.)

Zhenyu Dang, Jixuan Yang, Lin Wang, Qin Tao, Fengjun Zhang, Yuxin Zhang, Zewei Luo

Biosciences

Research output: Contribution to journal › Article › peer-review

15 Downloads (Pure)

Abstract

The new sequencing technology enables identification of genome-wide sequence-based variants at a population level and a competitively low cost. The sequence variant-based molecular markers have motivated enormous interest in population and quantitative genetic analyses. Generation of the sequence data involves a sophisticated experimental process embedded with rich non-biological variation. Statistically, the sequencing process indeed involves sampling DNA fragments from an individual sequence. Adequate knowledge of sampling variation of the sequence data generation is one of the key statistical properties for any downstream analysis of the data and for implementing statistically appropriate methods. This paper reports a thorough investigation on modeling the sampling variation of the sequence data from the optimized RAD-seq (Restriction sit associated DNA sequencing) experiments with two parents and their offspring of diploid and autotetraploid potato (Solanum tuberosum L.). The analysis shows significant dispersion in sampling variation of the sequence data over that expected under multinomial distribution as widely assumed in the literature and provides statistical methods for modeling the variation and calculating the model parameters, which may be easily implemented in real sequence datasets. The optimized design of RAD-seq experiments enabled effective control of presentation of undesirable chloroplast DNA and RNA genes in the sequence data generated.

Original language	English
Article number	319
Number of pages	12
Journal	Plants
Volume	10
Issue number	2
DOIs	https://doi.org/10.3390/plants10020319
Publication status	Published - 7 Feb 2021

Keywords

sampling variation
overdispersion
RAD-seq data
Solanum tuberosum L.

Access to Document

10.3390/plants10020319Licence: Creative Commons: Attribution (CC BY)

DangZ2021SamplingFinal published version, 2.52 MBLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{6a197cc8a61742e2b0516a23aed51f60,

title = "Sampling variation of RAD-seq data from diploid and tetraploid potato (Solanum tuberosum L.)",

abstract = "The new sequencing technology enables identification of genome-wide sequence-based variants at a population level and a competitively low cost. The sequence variant-based molecular markers have motivated enormous interest in population and quantitative genetic analyses. Generation of the sequence data involves a sophisticated experimental process embedded with rich non-biological variation. Statistically, the sequencing process indeed involves sampling DNA fragments from an individual sequence. Adequate knowledge of sampling variation of the sequence data generation is one of the key statistical properties for any downstream analysis of the data and for implementing statistically appropriate methods. This paper reports a thorough investigation on modeling the sampling variation of the sequence data from the optimized RAD-seq (Restriction sit associated DNA sequencing) experiments with two parents and their offspring of diploid and autotetraploid potato (Solanum tuberosum L.). The analysis shows significant dispersion in sampling variation of the sequence data over that expected under multinomial distribution as widely assumed in the literature and provides statistical methods for modeling the variation and calculating the model parameters, which may be easily implemented in real sequence datasets. The optimized design of RAD-seq experiments enabled effective control of presentation of undesirable chloroplast DNA and RNA genes in the sequence data generated.",

keywords = "sampling variation, overdispersion, RAD-seq data, Solanum tuberosum L.",

author = "Zhenyu Dang and Jixuan Yang and Lin Wang and Qin Tao and Fengjun Zhang and Yuxin Zhang and Zewei Luo",

year = "2021",

month = feb,

day = "7",

doi = "10.3390/plants10020319",

language = "English",

volume = "10",

journal = "Plants",

issn = "2223-7747",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "2",

}

TY - JOUR

T1 - Sampling variation of RAD-seq data from diploid and tetraploid potato (Solanum tuberosum L.)

AU - Dang, Zhenyu

AU - Yang, Jixuan

AU - Wang, Lin

AU - Tao, Qin

AU - Zhang, Fengjun

AU - Zhang, Yuxin

AU - Luo, Zewei

PY - 2021/2/7

Y1 - 2021/2/7

N2 - The new sequencing technology enables identification of genome-wide sequence-based variants at a population level and a competitively low cost. The sequence variant-based molecular markers have motivated enormous interest in population and quantitative genetic analyses. Generation of the sequence data involves a sophisticated experimental process embedded with rich non-biological variation. Statistically, the sequencing process indeed involves sampling DNA fragments from an individual sequence. Adequate knowledge of sampling variation of the sequence data generation is one of the key statistical properties for any downstream analysis of the data and for implementing statistically appropriate methods. This paper reports a thorough investigation on modeling the sampling variation of the sequence data from the optimized RAD-seq (Restriction sit associated DNA sequencing) experiments with two parents and their offspring of diploid and autotetraploid potato (Solanum tuberosum L.). The analysis shows significant dispersion in sampling variation of the sequence data over that expected under multinomial distribution as widely assumed in the literature and provides statistical methods for modeling the variation and calculating the model parameters, which may be easily implemented in real sequence datasets. The optimized design of RAD-seq experiments enabled effective control of presentation of undesirable chloroplast DNA and RNA genes in the sequence data generated.

AB - The new sequencing technology enables identification of genome-wide sequence-based variants at a population level and a competitively low cost. The sequence variant-based molecular markers have motivated enormous interest in population and quantitative genetic analyses. Generation of the sequence data involves a sophisticated experimental process embedded with rich non-biological variation. Statistically, the sequencing process indeed involves sampling DNA fragments from an individual sequence. Adequate knowledge of sampling variation of the sequence data generation is one of the key statistical properties for any downstream analysis of the data and for implementing statistically appropriate methods. This paper reports a thorough investigation on modeling the sampling variation of the sequence data from the optimized RAD-seq (Restriction sit associated DNA sequencing) experiments with two parents and their offspring of diploid and autotetraploid potato (Solanum tuberosum L.). The analysis shows significant dispersion in sampling variation of the sequence data over that expected under multinomial distribution as widely assumed in the literature and provides statistical methods for modeling the variation and calculating the model parameters, which may be easily implemented in real sequence datasets. The optimized design of RAD-seq experiments enabled effective control of presentation of undesirable chloroplast DNA and RNA genes in the sequence data generated.

KW - sampling variation

KW - overdispersion

KW - RAD-seq data

KW - Solanum tuberosum L.

U2 - 10.3390/plants10020319

DO - 10.3390/plants10020319

M3 - Article

SN - 2223-7747

VL - 10

JO - Plants

JF - Plants

IS - 2

M1 - 319

ER -

Sampling variation of RAD-seq data from diploid and tetraploid potato (Solanum tuberosum L.)

Abstract

Keywords

Access to Document

Fingerprint

Cite this