A Graph Theoretical Approach to Data Fusion

Justina Zurauskiene; Dirk Paul; Michael  Stmupf

doi:10.1515/sagmb-2016-0016

A Graph Theoretical Approach to Data Fusion

Justina Zurauskiene, Dirk Paul, Michael Stmupf

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

204 Downloads (Pure)

Abstract

The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. Therefore, it is increasingly being recognized that we can gain deeper understanding about underlying biology by combining the insights obtained from multiple, diverse datasets. Thus we propose a novel scalable computational approach to unsupervised data fusion. Our technique exploits network representations of the data to identify similarities among the datasets. We may work within the Bayesian formalism, using Bayesian nonparametric approaches to model each dataset; or (for fast, approximate, and massive scale data fusion) can naturally switch to more heuristic modeling techniques. An advantage of the proposed approach is that each dataset can initially be modeled independently (in parallel), before applying a fast post-processing step to perform data integration. This allows us to incorporate new experimental data in an online fashion, without having to rerun all of the analysis. We first demonstrate the applicability of our tool on artificial data, and then on examples from the literature, which include yeast cell cycle, breast cancer and sporadic inclusion body myositis datasets.

Original language	English
Pages (from-to)	107-122
Number of pages	16
Journal	Statistical applications in genetics and molecular biology
Volume	15
Issue number	2
DOIs	https://doi.org/10.1515/sagmb-2016-0016
Publication status	Published - 18 Mar 2016

Keywords

graph-theoretic methods
functional genomics
data integration
clustering

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1515/sagmb-2016-0016Licence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

Zurauskiene_et_al_A_graph_theoretical_Statistical_Applications_in_Genetics_and_Molecular_Biology_2018
Checked for eligibility 22/10/2018 First published in Statistical Applications in Genetics and Molecular Biology https://doi.org/10.1515/sagmb-2016-0016
Final published version, 2.29 MBLicence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

https://www.degruyter.com/view/j/sagmb.2016.15.issue-2/sagmb-2016-0016/sagmb-2016-0016.xmlLicence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

Cite this

@article{095eb60b84f846e9909212f4e08aac03,

title = "A Graph Theoretical Approach to Data Fusion",

abstract = "The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. Therefore, it is increasingly being recognized that we can gain deeper understanding about underlying biology by combining the insights obtained from multiple, diverse datasets. Thus we propose a novel scalable computational approach to unsupervised data fusion. Our technique exploits network representations of the data to identify similarities among the datasets. We may work within the Bayesian formalism, using Bayesian nonparametric approaches to model each dataset; or (for fast, approximate, and massive scale data fusion) can naturally switch to more heuristic modeling techniques. An advantage of the proposed approach is that each dataset can initially be modeled independently (in parallel), before applying a fast post-processing step to perform data integration. This allows us to incorporate new experimental data in an online fashion, without having to rerun all of the analysis. We first demonstrate the applicability of our tool on artificial data, and then on examples from the literature, which include yeast cell cycle, breast cancer and sporadic inclusion body myositis datasets.",

keywords = "graph-theoretic methods, functional genomics, data integration, clustering",

author = "Justina Zurauskiene and Dirk Paul and Michael Stmupf",

year = "2016",

month = mar,

day = "18",

doi = "10.1515/sagmb-2016-0016",

language = "English",

volume = "15",

pages = "107--122",

journal = "Statistical applications in genetics and molecular biology",

issn = "2194-6302",

publisher = "De Gruyter",

number = "2",

}

TY - JOUR

T1 - A Graph Theoretical Approach to Data Fusion

AU - Zurauskiene, Justina

AU - Paul, Dirk

AU - Stmupf, Michael

PY - 2016/3/18

Y1 - 2016/3/18

N2 - The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. Therefore, it is increasingly being recognized that we can gain deeper understanding about underlying biology by combining the insights obtained from multiple, diverse datasets. Thus we propose a novel scalable computational approach to unsupervised data fusion. Our technique exploits network representations of the data to identify similarities among the datasets. We may work within the Bayesian formalism, using Bayesian nonparametric approaches to model each dataset; or (for fast, approximate, and massive scale data fusion) can naturally switch to more heuristic modeling techniques. An advantage of the proposed approach is that each dataset can initially be modeled independently (in parallel), before applying a fast post-processing step to perform data integration. This allows us to incorporate new experimental data in an online fashion, without having to rerun all of the analysis. We first demonstrate the applicability of our tool on artificial data, and then on examples from the literature, which include yeast cell cycle, breast cancer and sporadic inclusion body myositis datasets.

AB - The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. Therefore, it is increasingly being recognized that we can gain deeper understanding about underlying biology by combining the insights obtained from multiple, diverse datasets. Thus we propose a novel scalable computational approach to unsupervised data fusion. Our technique exploits network representations of the data to identify similarities among the datasets. We may work within the Bayesian formalism, using Bayesian nonparametric approaches to model each dataset; or (for fast, approximate, and massive scale data fusion) can naturally switch to more heuristic modeling techniques. An advantage of the proposed approach is that each dataset can initially be modeled independently (in parallel), before applying a fast post-processing step to perform data integration. This allows us to incorporate new experimental data in an online fashion, without having to rerun all of the analysis. We first demonstrate the applicability of our tool on artificial data, and then on examples from the literature, which include yeast cell cycle, breast cancer and sporadic inclusion body myositis datasets.

KW - graph-theoretic methods

KW - functional genomics

KW - data integration

KW - clustering

U2 - 10.1515/sagmb-2016-0016

DO - 10.1515/sagmb-2016-0016

M3 - Article

SN - 2194-6302

VL - 15

SP - 107

EP - 122

JO - Statistical applications in genetics and molecular biology

JF - Statistical applications in genetics and molecular biology

IS - 2

ER -

A Graph Theoretical Approach to Data Fusion

Abstract

Keywords

UN SDGs

Access to Document

Fingerprint

Cite this