Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers

Kieran Campbell; Christopher Yau

doi:10.12688/wellcomeopenres.11087.1

Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers

Kieran Campbell, Christopher Yau

Cancer and Genomic Sciences

Research output: Contribution to journal › Article › peer-review

15 Citations (Scopus)

Abstract

Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

Original language	English
Journal	Wellcome Open Research
Volume	2
Issue number	19
Early online date	15 Mar 2017
DOIs	https://doi.org/10.12688/wellcomeopenres.11087.1
Publication status	E-pub ahead of print - 15 Mar 2017

Access to Document

10.12688/wellcomeopenres.11087.1Licence: Creative Commons: Attribution (CC BY)

Cite this

@article{66eefd3df5fc4d2d946266da14aec09f,

title = "Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers",

abstract = "Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.",

author = "Kieran Campbell and Christopher Yau",

year = "2017",

month = mar,

day = "15",

doi = "10.12688/wellcomeopenres.11087.1",

language = "English",

volume = "2",

journal = "Wellcome Open Research",

issn = "2398-502X",

publisher = "Wellcome Trust",

number = "19",

}

TY - JOUR

T1 - Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers

AU - Campbell, Kieran

AU - Yau, Christopher

PY - 2017/3/15

Y1 - 2017/3/15

N2 - Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

AB - Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.

U2 - 10.12688/wellcomeopenres.11087.1

DO - 10.12688/wellcomeopenres.11087.1

M3 - Article

SN - 2398-502X

VL - 2

JO - Wellcome Open Research

JF - Wellcome Open Research

IS - 19

ER -

Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers

Abstract

Access to Document

Fingerprint

Cite this