pcaReduce: hierarchical clustering of single cell transcriptional profiles

Justina Zurauskiene; Christopher Yau

doi:10.1186/s12859-016-0984-y

pcaReduce: hierarchical clustering of single cell transcriptional profiles

Justina Zurauskiene, Christopher Yau

Research output: Contribution to journal › Article › peer-review

105 Citations (Scopus)

206 Downloads (Pure)

Abstract

BACKGROUND: Advances in single cell genomics provide a way of routinely generating transcriptomics data at the single cell level. A frequent requirement of single cell expression analysis is the identification of novel patterns of heterogeneity across single cells that might explain complex cellular states or tissue composition. To date, classical statistical analysis tools have being routinely applied, but there is considerable scope for the development of novel statistical approaches that are better adapted to the challenges of inferring cellular hierarchies.

RESULTS: We have developed a novel agglomerative clustering method that we call pcaReduce to generate a cell state hierarchy where each cluster branch is associated with a principal component of variation that can be used to differentiate two cell states. Using two real single cell datasets, we compared our approach to other commonly used statistical techniques, such as K-means and hierarchical clustering. We found that pcaReduce was able to give more consistent clustering structures when compared to broad and detailed cell type labels.

CONCLUSIONS: Our novel integration of principal components analysis and hierarchical clustering establishes a connection between the representation of the expression data and the number of cell types that can be discovered. In doing so we found that pcaReduce performs better than either technique in isolation in terms of characterising putative cell states. Our methodology is complimentary to other single cell clustering techniques and adds to a growing palette of single cell bioinformatics tools for profiling heterogeneous cell populations.

Original language	English
Article number	140
Number of pages	11
Journal	BMC Bioinformatics
Volume	17
DOIs	https://doi.org/10.1186/s12859-016-0984-y
Publication status	Published - 22 Mar 2016

Keywords

Algorithms
Animals
Cell Line
Cluster Analysis
Humans
Principal Component Analysis
RNA
Sequence Analysis, RNA
Single-Cell Analysis
Transcriptome
Journal Article
Research Support, Non-U.S. Gov't

Access to Document

10.1186/s12859-016-0984-yLicence: Creative Commons: Attribution (CC BY)

Zurauskiene_&_Yau_pcaReduce_hierarchical_BMC_Bioinformatics_2016
First published in BMC Bioinformatics https://doi.org/10.1186/s12859-016-0984-y
Final published version, 2.48 MBLicence: Creative Commons: Attribution (CC BY)

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0984-yLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{9b6194cb0120413ca618f784abc4e9aa,

title = "pcaReduce: hierarchical clustering of single cell transcriptional profiles",

abstract = "BACKGROUND: Advances in single cell genomics provide a way of routinely generating transcriptomics data at the single cell level. A frequent requirement of single cell expression analysis is the identification of novel patterns of heterogeneity across single cells that might explain complex cellular states or tissue composition. To date, classical statistical analysis tools have being routinely applied, but there is considerable scope for the development of novel statistical approaches that are better adapted to the challenges of inferring cellular hierarchies.RESULTS: We have developed a novel agglomerative clustering method that we call pcaReduce to generate a cell state hierarchy where each cluster branch is associated with a principal component of variation that can be used to differentiate two cell states. Using two real single cell datasets, we compared our approach to other commonly used statistical techniques, such as K-means and hierarchical clustering. We found that pcaReduce was able to give more consistent clustering structures when compared to broad and detailed cell type labels.CONCLUSIONS: Our novel integration of principal components analysis and hierarchical clustering establishes a connection between the representation of the expression data and the number of cell types that can be discovered. In doing so we found that pcaReduce performs better than either technique in isolation in terms of characterising putative cell states. Our methodology is complimentary to other single cell clustering techniques and adds to a growing palette of single cell bioinformatics tools for profiling heterogeneous cell populations.",

keywords = "Algorithms, Animals, Cell Line, Cluster Analysis, Humans, Principal Component Analysis, RNA, Sequence Analysis, RNA, Single-Cell Analysis, Transcriptome, Journal Article, Research Support, Non-U.S. Gov't",

author = "Justina Zurauskiene and Christopher Yau",

year = "2016",

month = mar,

day = "22",

doi = "10.1186/s12859-016-0984-y",

language = "English",

volume = "17",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "Springer",

}

TY - JOUR

T1 - pcaReduce

T2 - hierarchical clustering of single cell transcriptional profiles

AU - Zurauskiene, Justina

AU - Yau, Christopher

PY - 2016/3/22

Y1 - 2016/3/22

N2 - BACKGROUND: Advances in single cell genomics provide a way of routinely generating transcriptomics data at the single cell level. A frequent requirement of single cell expression analysis is the identification of novel patterns of heterogeneity across single cells that might explain complex cellular states or tissue composition. To date, classical statistical analysis tools have being routinely applied, but there is considerable scope for the development of novel statistical approaches that are better adapted to the challenges of inferring cellular hierarchies.RESULTS: We have developed a novel agglomerative clustering method that we call pcaReduce to generate a cell state hierarchy where each cluster branch is associated with a principal component of variation that can be used to differentiate two cell states. Using two real single cell datasets, we compared our approach to other commonly used statistical techniques, such as K-means and hierarchical clustering. We found that pcaReduce was able to give more consistent clustering structures when compared to broad and detailed cell type labels.CONCLUSIONS: Our novel integration of principal components analysis and hierarchical clustering establishes a connection between the representation of the expression data and the number of cell types that can be discovered. In doing so we found that pcaReduce performs better than either technique in isolation in terms of characterising putative cell states. Our methodology is complimentary to other single cell clustering techniques and adds to a growing palette of single cell bioinformatics tools for profiling heterogeneous cell populations.

AB - BACKGROUND: Advances in single cell genomics provide a way of routinely generating transcriptomics data at the single cell level. A frequent requirement of single cell expression analysis is the identification of novel patterns of heterogeneity across single cells that might explain complex cellular states or tissue composition. To date, classical statistical analysis tools have being routinely applied, but there is considerable scope for the development of novel statistical approaches that are better adapted to the challenges of inferring cellular hierarchies.RESULTS: We have developed a novel agglomerative clustering method that we call pcaReduce to generate a cell state hierarchy where each cluster branch is associated with a principal component of variation that can be used to differentiate two cell states. Using two real single cell datasets, we compared our approach to other commonly used statistical techniques, such as K-means and hierarchical clustering. We found that pcaReduce was able to give more consistent clustering structures when compared to broad and detailed cell type labels.CONCLUSIONS: Our novel integration of principal components analysis and hierarchical clustering establishes a connection between the representation of the expression data and the number of cell types that can be discovered. In doing so we found that pcaReduce performs better than either technique in isolation in terms of characterising putative cell states. Our methodology is complimentary to other single cell clustering techniques and adds to a growing palette of single cell bioinformatics tools for profiling heterogeneous cell populations.

KW - Algorithms

KW - Animals

KW - Cell Line

KW - Cluster Analysis

KW - Humans

KW - Principal Component Analysis

KW - RNA

KW - Sequence Analysis, RNA

KW - Single-Cell Analysis

KW - Transcriptome

KW - Journal Article

KW - Research Support, Non-U.S. Gov't

U2 - 10.1186/s12859-016-0984-y

DO - 10.1186/s12859-016-0984-y

M3 - Article

C2 - 27005807

SN - 1471-2105

VL - 17

JO - BMC Bioinformatics

JF - BMC Bioinformatics

M1 - 140

ER -

pcaReduce: hierarchical clustering of single cell transcriptional profiles

Abstract

Keywords

Access to Document

Fingerprint

Cite this