Basic Properties and Information Theory of Audic-Claverie Statistic for Analyzing cDNA Arrays

Peter Tino

doi:10.1186/1471-2105-10-310

Basic Properties and Information Theory of Audic-Claverie Statistic for Analyzing cDNA Arrays

Peter Tino

Computer Science

Research output: Contribution to journal › Article › peer-review

19 Citations (Scopus)

141 Downloads (Pure)

Abstract

Background: The Audic-Claverie method [1] has been and still continues to be a popular approach for detection of differentially expressed genes in the SAGE framework. The method is based on the assumption that under the null hypothesis tag counts of the same gene in two libraries come from the same but unknown Poisson distribution. The problem is that each SAGE library represents only a single measurement. We ask: Given that the tag count samples from SAGE libraries are extremely limited, how useful actually is the Audic-Claverie methodology? We rigorously analyze the A-C statistic that forms a backbone of the methodology and represents our knowledge of the underlying tag generating process based on one observation. Results: We show that the A-C statistic and the underlying Poisson distribution of the tag counts share the same mode structure. Moreover, the K-L divergence from the true unknown Poisson distribution to the A-C statistic is minimized when the A-C statistic is conditioned on the mode of the Poisson distribution. Most importantly, the expectation of this K-L divergence never exceeds 1/2 bit. Conclusion: A rigorous underpinning of the Audic-Claverie methodology has been missing. Our results constitute a rigorous argument supporting the use of Audic-Claverie method even though the SAGE libraries represent very sparse samples.

Original language	English
Article number	310
Journal	BMC Bioinformatics
Volume	10
Issue number	1
Early online date	23 Sept 2009
DOIs	https://doi.org/10.1186/1471-2105-10-310
Publication status	Published - 23 Sept 2009

Access to Document

10.1186/1471-2105-10-310

Tino_Basic_Properties_Information_Theory_BMC_Bioinformatics_2009
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Checked July 2015
Final published version, 445 KBLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{284ce07b90064e489949f154400b641b,

title = "Basic Properties and Information Theory of Audic-Claverie Statistic for Analyzing cDNA Arrays",

abstract = "Background: The Audic-Claverie method [1] has been and still continues to be a popular approach for detection of differentially expressed genes in the SAGE framework. The method is based on the assumption that under the null hypothesis tag counts of the same gene in two libraries come from the same but unknown Poisson distribution. The problem is that each SAGE library represents only a single measurement. We ask: Given that the tag count samples from SAGE libraries are extremely limited, how useful actually is the Audic-Claverie methodology? We rigorously analyze the A-C statistic that forms a backbone of the methodology and represents our knowledge of the underlying tag generating process based on one observation. Results: We show that the A-C statistic and the underlying Poisson distribution of the tag counts share the same mode structure. Moreover, the K-L divergence from the true unknown Poisson distribution to the A-C statistic is minimized when the A-C statistic is conditioned on the mode of the Poisson distribution. Most importantly, the expectation of this K-L divergence never exceeds 1/2 bit. Conclusion: A rigorous underpinning of the Audic-Claverie methodology has been missing. Our results constitute a rigorous argument supporting the use of Audic-Claverie method even though the SAGE libraries represent very sparse samples.",

author = "Peter Tino",

year = "2009",

month = sep,

day = "23",

doi = "10.1186/1471-2105-10-310",

language = "English",

volume = "10",

journal = "BMC Bioinformatics",

publisher = "Springer",

number = "1",

}

TY - JOUR

T1 - Basic Properties and Information Theory of Audic-Claverie Statistic for Analyzing cDNA Arrays

AU - Tino, Peter

PY - 2009/9/23

Y1 - 2009/9/23

N2 - Background: The Audic-Claverie method [1] has been and still continues to be a popular approach for detection of differentially expressed genes in the SAGE framework. The method is based on the assumption that under the null hypothesis tag counts of the same gene in two libraries come from the same but unknown Poisson distribution. The problem is that each SAGE library represents only a single measurement. We ask: Given that the tag count samples from SAGE libraries are extremely limited, how useful actually is the Audic-Claverie methodology? We rigorously analyze the A-C statistic that forms a backbone of the methodology and represents our knowledge of the underlying tag generating process based on one observation. Results: We show that the A-C statistic and the underlying Poisson distribution of the tag counts share the same mode structure. Moreover, the K-L divergence from the true unknown Poisson distribution to the A-C statistic is minimized when the A-C statistic is conditioned on the mode of the Poisson distribution. Most importantly, the expectation of this K-L divergence never exceeds 1/2 bit. Conclusion: A rigorous underpinning of the Audic-Claverie methodology has been missing. Our results constitute a rigorous argument supporting the use of Audic-Claverie method even though the SAGE libraries represent very sparse samples.

AB - Background: The Audic-Claverie method [1] has been and still continues to be a popular approach for detection of differentially expressed genes in the SAGE framework. The method is based on the assumption that under the null hypothesis tag counts of the same gene in two libraries come from the same but unknown Poisson distribution. The problem is that each SAGE library represents only a single measurement. We ask: Given that the tag count samples from SAGE libraries are extremely limited, how useful actually is the Audic-Claverie methodology? We rigorously analyze the A-C statistic that forms a backbone of the methodology and represents our knowledge of the underlying tag generating process based on one observation. Results: We show that the A-C statistic and the underlying Poisson distribution of the tag counts share the same mode structure. Moreover, the K-L divergence from the true unknown Poisson distribution to the A-C statistic is minimized when the A-C statistic is conditioned on the mode of the Poisson distribution. Most importantly, the expectation of this K-L divergence never exceeds 1/2 bit. Conclusion: A rigorous underpinning of the Audic-Claverie methodology has been missing. Our results constitute a rigorous argument supporting the use of Audic-Claverie method even though the SAGE libraries represent very sparse samples.

U2 - 10.1186/1471-2105-10-310

DO - 10.1186/1471-2105-10-310

M3 - Article

C2 - 19775462

VL - 10

JO - BMC Bioinformatics

JF - BMC Bioinformatics

IS - 1

M1 - 310

ER -

Basic Properties and Information Theory of Audic-Claverie Statistic for Analyzing cDNA Arrays

Abstract

Access to Document

Fingerprint

Cite this