Motif searches in sequence databases

	Bioinformatics Molecular biology in the internet
Main page	Appointments	Bioinformatics	Literature	Exercises	Tasks
Databases	Software	Sequence comparisons	Homology searches	Motif searches	Hidden Markov models
Hydrophobicity analyses	Topology and helix packing	Protein localization	Secondary structure	Super-secondary structure	3D structure

Motif searches in sequence databases:

Several databases exist which provide information about conserved regions within proteins. One can query these databases with own sequences or sequence blocks. The following websites can be used to access several motif search algorithms:

Overview about Database with protein sequence motifs
Overview about Database with protein domains
Overview about Database of individual protein families
The PROSITE Database of Protein Families and Domains
- ScanProsite - Protein sequence versus PROSITE or Pattern versus SWISS-PROT
- ProfileScan - Protein sequence versus Profile Database
- Other pattern and profile search tools
The MEME/MAST System:
MEME (Multiple EM for Motif Elicitation) is a tool for discovering motifs in a group of related DNA or protein sequences.
MAST (Multiple Alignment and Search Tool) is a tool for searching biological sequence databases for sequences that contain one or more of a group of known motifs.
The Blocks Database
- Suche eines Datenbank-Eintrags
- Blocks Searcher: Protein sequence versus Blocks
- Reverse PSI-BLAST: Protein sequence versus Blocks
- IMPALA (Integrating Matrix Profiles And Local Alignments): Protein sequence versus Blocks
- LAMA (Local Alignment of Multiple Alignments): Sequence block (multiple alignment) versus Blocks
- CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primers): Design of oligonucleotides for hybridization or PCR experiments
The PRINTS Protein motif fingerprint database
- Search of a database entry
- FingerPRINTScan: Protein sequence versus PRINTS
- BLAST-Search: Protein sequence versus PRINTS
The Pfam Protein Family database at the Sanger Center or Pfam at WUStL
The ProDom Protein Domain database
The DOMO Database of homologous protein Domain families
The eMOTIF Protein sequence Motif determination and searches
The PROF_PAT Database of protein family Patterns
The SBASE Protein domain library
The SYSTERS (SYSTEmatic Re-Searching) protein sequence cluster set
The iProClass Integrated Protein Classification database
The InterPro Integrated resource of Protein domains and functional sites
- InterProScan - Protein sequence versus Profile Database
The MetaFam Unified classification of protein Families (OUTDATED)

PROSITE - Database of Protein Families and Domains

patterns (motifs) and profiles (weight matrices)

The Blocks database

Blocks Database

PROSITE Database

SWISS-PROT database

ExPASy World Wide Web (WWW) Molecular Biology Server

Block format

Introduction

current database

The Prints Protein Motif Fingerprint Database in Blocks Format

Blocks from Pfam protein groups

Blocks from ProDom protein groups

Blocks from Domo protein groups

PRINTS - Protein Motif Fingerprint Database

Pfam - Protein Family Database

ProDom - Protein Domain Database

July 1998 (ProDom 35)

March 2001 (ProDom 2001.1)

DOMO - Database of Homologous Protein Domain Families

InterPro - Integrated Resource of Protein Domains and Functional Sites

Further search possibilities in sequence databases:

In addition to searches for similar (homologous) sequences or sequence motifs, one can also search sequence databases by other criteria. Some interesting approaches are listed below:

Hydrophilicity search

This page calculates the hydrophilicity/hydrophobicity profile of a given protein and compares it with a library of protein hydropathic profiles (in this case the library of protein hydrophilicities is made from the SWISS-PROT database). Please enter information about the query protein, select a method of calculating the hydrophilicity, and the size of the window to calculate it over, and the number of matches you would like to see. A page will then be returned listing the best protein hydropathic profile matches. To see a plot of the two protein hydropathic profiles simply select the protein you are interested in.
PROPSEARCH

Common protein sequence alignment programs are at present not capable to detect functional and / or structural homologs, if the sequence identity is below the significance threshold of about 25%. PROPSEARCH was designed to find the putative protein family if querying a new sequence has failed using alignment methods. By neglecting the order of amino acid residues in a sequence, PROPSEARCH uses the amino acid composition instead. In addition, other properties like molecular weight, content of bulky residues, content of small residues, average hydrophobicity, average charge a.s.o. and the content of selected dipeptide- groups are calculated from the sequence as well. 144 such properties are weighted individually and are used as query vector. The weights have been trained on a set of protein families with known structures, using a genetic algorithm. Sequences in the database are transformed into vectors as well, and the euclidian distance between the query and database sequences is calculated. Distances are rank ordered, and sequences with lowest distance are reported on top (Hobohm and Sander, 1995).
PatScan
PatScan is a pattern matcher which searches protein or nucleotide (DNA, RNA, tRNA etc.) sequence archives for instances of a pattern which you input (Dsouza et al., 1997).
AACompIdent
AACompIdent is a tool which allows the identification of a protein from its amino acid composition. It searches the SWISS-PROT and / or TrEMBL databases for proteins, whose amino acid compositions are closest to the amino acid composition given. You will have to enter the following data:
1. Amino acid composition of the protein to identify.
2. A name for this protein, so that you can recognize it later in the results.
3. The pI and Mw of that protein, if known, as well as error ranges that reflect the accuracy of these estimates.
4. The species or group of species for which you would like to perform the search (example: HOMO SAPIENS or MAMMALIA). This will produce the list of proteins from this species, as well as a list of proteins independently of species. You may also just specify ALL for all SWISS-PROT / TrEMBL entries; If in doubt about the search term to use, consult the SWISS-PROT list of species.
5. For scan in SWISS-PROT only: the keyword for which you would like to perform the search (example: ZINC-FINGER). This will produce the list of proteins matching this keyword. You may also just specify ALL for all SWISS-PROT entries; If in doubt about the exact keyword to use, consult the list of keywords used in SWISS-PROT.
6. Amino acid composition of a known protein, obtained in the same run as the amino acid composition of the unknown protein. This is for calibration; if you do not have a calibration protein, leave NULL.
7. The SWISS-PROT identifier (ID) of the calibration protein (example: ALBU_HUMAN).
8. Your e-mail address. The search results will be mailed back to you automatically.
More protein identification tools at ExPASy

Examples of PROSITE search motifs:

Pattern
Profile

Sequence examples:

Database of amino acid sequences via Entrez
Bacteriorhodopsin from Halobacterium salinarium: Seven-helix bundle protein
TonB from Escherichia coli: Protein with N-terminal transmembrane helix
Maltose-binding protein from Escherichia coli: Protein with N-terminal signal sequence for secretion into the periplasmic space

Abbreviations:

CODEHOP: COnsensus-DEgenerate Hybrid Oligonucleotide Primers
IMPALA: Integrating Matrix Profiles And Local Alignments
LAMA: Local Alignment of Multiple Alignments
MAST: Multiple Alignment and Search Tool
MEME: Multiple EM for Motif Elicitation
SMART: Simple Modular Architecture Research Tool
SYSTERS: SYSTEmatic Re-Searching

References

Latest update of content: September 20, 2005

Ralf Koebnik
Institut de recherche pour le dèveloppement
UMR 5096, CNRS-UP-IRD
911, Avenue Agropolis, BP 64501
34394 Montpellier, Cedex 5
FRANCE
Phone: +33 (0)4 67 41 62 28
Fax: +33 (0)4 67 41 61 81
Email: koebnik(at)gmx.de
Please replace (at) by @.

Back to main page