Predicting Protein Function by Machine Learning on Amino Acid Sequences - A Critical Evaluation

Ali Al-Shahib; R Breitling; DR Gilbert

doi:10.1186/1471-2164-8-78

Predicting Protein Function by Machine Learning on Amino Acid Sequences - A Critical Evaluation

Ali Al-Shahib, R Breitling, DR Gilbert

Electronic, Electrical and Systems Engineering

Research output: Contribution to journal › Article › peer-review

25 Citations (Scopus)

132 Downloads (Pure)

Abstract

BACKGROUND: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. RESULTS: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. CONCLUSION: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function.

Original language	English
Article number	78
Journal	BMC Genomics
Volume	8
Issue number	78
DOIs	https://doi.org/10.1186/1471-2164-8-78
Publication status	Published - 20 Mar 2007

Access to Document

10.1186/1471-2164-8-78

Al-Shahib_et_al_Predicting_Protein_Function_BMC_Genomics_2007
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Checked July 2015
Final published version, 283 KBLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{1929b3f1fe0c499895d0012fb04317db,

title = "Predicting Protein Function by Machine Learning on Amino Acid Sequences - A Critical Evaluation",

abstract = "BACKGROUND: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. RESULTS: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. CONCLUSION: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function.",

author = "Ali Al-Shahib and R Breitling and DR Gilbert",

year = "2007",

month = mar,

day = "20",

doi = "10.1186/1471-2164-8-78",

language = "English",

volume = "8",

journal = "BMC Genomics",

publisher = "Springer",

number = "78",

}

TY - JOUR

T1 - Predicting Protein Function by Machine Learning on Amino Acid Sequences - A Critical Evaluation

AU - Al-Shahib, Ali

AU - Breitling, R

AU - Gilbert, DR

PY - 2007/3/20

Y1 - 2007/3/20

N2 - BACKGROUND: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. RESULTS: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. CONCLUSION: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function.

AB - BACKGROUND: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. RESULTS: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. CONCLUSION: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function.

U2 - 10.1186/1471-2164-8-78

DO - 10.1186/1471-2164-8-78

M3 - Article

C2 - 17374164

VL - 8

JO - BMC Genomics

JF - BMC Genomics

IS - 78

M1 - 78

ER -

Predicting Protein Function by Machine Learning on Amino Acid Sequences - A Critical Evaluation

Abstract

Access to Document

Fingerprint

Cite this