SIMAP: the similarity matrix of proteins

Thomas Rattei; Roland Arnold; Patrick Tischler; Dominik Lindner; Volker Stümpflen; H Werner Mewes

doi:10.1093/nar/gkj106

SIMAP: the similarity matrix of proteins

Thomas Rattei, Roland Arnold, Patrick Tischler, Dominik Lindner, Volker Stümpflen, H Werner Mewes

Cancer and Genomic Sciences

Research output: Contribution to journal › Article › peer-review

38 Citations (Scopus)

Abstract

Similarity Matrix of Proteins (SIMAP) (http://mips.gsf.de/simap) provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith-Waterman algorithm. Our ProtInfo system allows querying by protein sequences covered by the SIMAP dataset as well as by fragments of these sequences, highly similar sequences and title words. Each sequence in the database is supplemented with pre-calculated features generated by detailed sequence analyses. By providing WWW interfaces as well as web-services, we offer the SIMAP resource as an efficient and comprehensive tool for sequence similarity searches.

Original language	English
Pages (from-to)	D252-6
Journal	Nucleic Acids Research
Volume	34
Issue number	Database issue
DOIs	https://doi.org/10.1093/nar/gkj106
Publication status	Published - 1 Jan 2006

Keywords

Databases, Protein
Internet
Sequence Alignment
Sequence Homology, Amino Acid
Software
User-Computer Interface
Journal Article
Research Support, Non-U.S. Gov't

Access to Document

10.1093/nar/gkj106

Cite this

@article{764545ca076a4daf82c6ae0b9beee7cd,

title = "SIMAP: the similarity matrix of proteins",

abstract = "Similarity Matrix of Proteins (SIMAP) (http://mips.gsf.de/simap) provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith-Waterman algorithm. Our ProtInfo system allows querying by protein sequences covered by the SIMAP dataset as well as by fragments of these sequences, highly similar sequences and title words. Each sequence in the database is supplemented with pre-calculated features generated by detailed sequence analyses. By providing WWW interfaces as well as web-services, we offer the SIMAP resource as an efficient and comprehensive tool for sequence similarity searches.",

keywords = "Databases, Protein, Internet, Sequence Alignment, Sequence Homology, Amino Acid, Software, User-Computer Interface, Journal Article, Research Support, Non-U.S. Gov't",

author = "Thomas Rattei and Roland Arnold and Patrick Tischler and Dominik Lindner and Volker St{\"u}mpflen and Mewes, {H Werner}",

year = "2006",

month = jan,

day = "1",

doi = "10.1093/nar/gkj106",

language = "English",

volume = "34",

pages = "D252--6",

journal = "Nucleic Acids Research",

issn = "0305-1048",

publisher = "Oxford University Press",

number = "Database issue",

}

TY - JOUR

T1 - SIMAP

T2 - the similarity matrix of proteins

AU - Rattei, Thomas

AU - Arnold, Roland

AU - Tischler, Patrick

AU - Lindner, Dominik

AU - Stümpflen, Volker

AU - Mewes, H Werner

PY - 2006/1/1

Y1 - 2006/1/1

N2 - Similarity Matrix of Proteins (SIMAP) (http://mips.gsf.de/simap) provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith-Waterman algorithm. Our ProtInfo system allows querying by protein sequences covered by the SIMAP dataset as well as by fragments of these sequences, highly similar sequences and title words. Each sequence in the database is supplemented with pre-calculated features generated by detailed sequence analyses. By providing WWW interfaces as well as web-services, we offer the SIMAP resource as an efficient and comprehensive tool for sequence similarity searches.

AB - Similarity Matrix of Proteins (SIMAP) (http://mips.gsf.de/simap) provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith-Waterman algorithm. Our ProtInfo system allows querying by protein sequences covered by the SIMAP dataset as well as by fragments of these sequences, highly similar sequences and title words. Each sequence in the database is supplemented with pre-calculated features generated by detailed sequence analyses. By providing WWW interfaces as well as web-services, we offer the SIMAP resource as an efficient and comprehensive tool for sequence similarity searches.

KW - Databases, Protein

KW - Internet

KW - Sequence Alignment

KW - Sequence Homology, Amino Acid

KW - Software

KW - User-Computer Interface

KW - Journal Article

KW - Research Support, Non-U.S. Gov't

U2 - 10.1093/nar/gkj106

DO - 10.1093/nar/gkj106

M3 - Article

C2 - 16381858

SN - 0305-1048

VL - 34

SP - D252-6

JO - Nucleic Acids Research

JF - Nucleic Acids Research

IS - Database issue

ER -

SIMAP: the similarity matrix of proteins

Abstract

Keywords

Access to Document

Fingerprint

Cite this