Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction.

Rafik Salama; Dov Stekel

doi:10.1093/nar/gkq274

Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction.

Rafik Salama, Dov Stekel

Research output: Contribution to journal › Article

19 Citations (Scopus)

Abstract

Prediction of transcription factor binding sites is an important challenge in genome analysis. The advent of next generation genome sequencing technologies makes the development of effective computational approaches particularly imperative. We have developed a novel training-based methodology intended for prokaryotic transcription factor binding site prediction. Our methodology extends existing models by taking into account base interdependencies between neighbouring positions using conditional probabilities and includes genomic background weighting. This has been tested against other existing and novel methodologies including position-specific weight matrices, first-order Hidden Markov Models and joint probability models. We have also tested the use of gapped and ungapped alignments and the inclusion or exclusion of background weighting. We show that our best method enhances binding site prediction for all of the 22 Escherichia coli transcription factors with at least 20 known binding sites, with many showing substantial improvements. We highlight the advantage of using block alignments of binding sites over gapped alignments to capture neighbouring position interdependencies. We also show that combining these methods with ChIP-on-chip data has the potential to further improve binding site prediction. Finally we have developed the ungapped likelihood under positional background platform: a user friendly website that gives access to the prediction method devised in this work.

Original language	English
Pages (from-to)	e135
Journal	Nucleic Acids Research
Volume	38
Issue number	12
DOIs	https://doi.org/10.1093/nar/gkq274
Publication status	Published - 1 Jul 2010

Access to Document

10.1093/nar/gkq274

Cite this

@article{c58b8b57a5394e45b1d9c7fb06a2cecb,

title = "Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction.",

abstract = "Prediction of transcription factor binding sites is an important challenge in genome analysis. The advent of next generation genome sequencing technologies makes the development of effective computational approaches particularly imperative. We have developed a novel training-based methodology intended for prokaryotic transcription factor binding site prediction. Our methodology extends existing models by taking into account base interdependencies between neighbouring positions using conditional probabilities and includes genomic background weighting. This has been tested against other existing and novel methodologies including position-specific weight matrices, first-order Hidden Markov Models and joint probability models. We have also tested the use of gapped and ungapped alignments and the inclusion or exclusion of background weighting. We show that our best method enhances binding site prediction for all of the 22 Escherichia coli transcription factors with at least 20 known binding sites, with many showing substantial improvements. We highlight the advantage of using block alignments of binding sites over gapped alignments to capture neighbouring position interdependencies. We also show that combining these methods with ChIP-on-chip data has the potential to further improve binding site prediction. Finally we have developed the ungapped likelihood under positional background platform: a user friendly website that gives access to the prediction method devised in this work.",

author = "Rafik Salama and Dov Stekel",

year = "2010",

month = jul,

day = "1",

doi = "10.1093/nar/gkq274",

language = "English",

volume = "38",

pages = "e135",

journal = "Nucleic Acids Research",

issn = "1362-4962",

publisher = "Oxford University Press",

number = "12",

}

TY - JOUR

T1 - Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction.

AU - Salama, Rafik

AU - Stekel, Dov

PY - 2010/7/1

Y1 - 2010/7/1

N2 - Prediction of transcription factor binding sites is an important challenge in genome analysis. The advent of next generation genome sequencing technologies makes the development of effective computational approaches particularly imperative. We have developed a novel training-based methodology intended for prokaryotic transcription factor binding site prediction. Our methodology extends existing models by taking into account base interdependencies between neighbouring positions using conditional probabilities and includes genomic background weighting. This has been tested against other existing and novel methodologies including position-specific weight matrices, first-order Hidden Markov Models and joint probability models. We have also tested the use of gapped and ungapped alignments and the inclusion or exclusion of background weighting. We show that our best method enhances binding site prediction for all of the 22 Escherichia coli transcription factors with at least 20 known binding sites, with many showing substantial improvements. We highlight the advantage of using block alignments of binding sites over gapped alignments to capture neighbouring position interdependencies. We also show that combining these methods with ChIP-on-chip data has the potential to further improve binding site prediction. Finally we have developed the ungapped likelihood under positional background platform: a user friendly website that gives access to the prediction method devised in this work.

AB - Prediction of transcription factor binding sites is an important challenge in genome analysis. The advent of next generation genome sequencing technologies makes the development of effective computational approaches particularly imperative. We have developed a novel training-based methodology intended for prokaryotic transcription factor binding site prediction. Our methodology extends existing models by taking into account base interdependencies between neighbouring positions using conditional probabilities and includes genomic background weighting. This has been tested against other existing and novel methodologies including position-specific weight matrices, first-order Hidden Markov Models and joint probability models. We have also tested the use of gapped and ungapped alignments and the inclusion or exclusion of background weighting. We show that our best method enhances binding site prediction for all of the 22 Escherichia coli transcription factors with at least 20 known binding sites, with many showing substantial improvements. We highlight the advantage of using block alignments of binding sites over gapped alignments to capture neighbouring position interdependencies. We also show that combining these methods with ChIP-on-chip data has the potential to further improve binding site prediction. Finally we have developed the ungapped likelihood under positional background platform: a user friendly website that gives access to the prediction method devised in this work.

U2 - 10.1093/nar/gkq274

DO - 10.1093/nar/gkq274

M3 - Article

C2 - 20439311

SN - 1362-4962

VL - 38

SP - e135

JO - Nucleic Acids Research

JF - Nucleic Acids Research

IS - 12

ER -

Inclusion of neighboring base interdependencies substantially improves genome-wide prokaryotic transcription factor binding site prediction.

Abstract

Access to Document

Fingerprint

Cite this