Homology searches in DNA and protein databases: BLAST and FASTA

	Bioinformatics Molecular biology in the internet
Main page	Appointments	Bioinformatics	Literature	Exercises	Tasks
Databases	Software	Sequence comparisons	Homology searches	Motif searches	Hidden Markov models
Hydrophobicity analyses	Topology and helix packing	Protein localization	Secondary structure	Super-secondary structure	3D structure

Homology searches in the internet:

BCM search platform for Nucleotide sequences: WU-BLASTX, BLASTN, TBLASTX, BEAUTY-X, tRNAscan-SE
BCM search platform for Amino acid sequences: (WU-BLASTP), BLASTP, TBLASTN, FASTA, FASTA-SWAP, BEAUTY, (PPSEARCH), BLOCKS Search, COGNOTOR, (BLITZ [Smith-Waterman]), MPsrch, (SSEARCH [Smith-Waterman])
BLAST searches at NIH: BLAST 2.0, PSI-BLAST, PHI-BLAST, MEGABLAST
- BLAST: Overview
- BLAST: Problems (FAQ)
- BLAST: Tutorial
- BLAST: Course 1/3 (BLAST statistics)
- BLAST: Course 2/3 (PSI-BLAST)
- BLAST: Course 3/3 (PSI-BLAST statistics)
WU-BLAST at Washington University moved to Advanced Biocomputing and became AB-BLAST
WU-BLAST2 at EMBL Heidelberg
- AB-BLAST
- AB-BLAST: FAQ
Similarity search Tools at EBI
- BLAST: WU-BLAST2, NCBI-BLAST2, PHI-BLAST, PSI-BLAST
- FASTA
- Bic_SW (Smith-Waterman) (OUTDATED)
- MPsearch (Smith-Waterman)
- ScanPS (Smith-Waterman)
- SSEARCH (Smith-Waterman)
- PSI-Search (Smith-Waterman-based PSI-BLAST)
- GGSEARCH (Needleman-Wunsch)
- GLSEARCH
- PPSearch
Mega BLAST at Toulouse: BLASTN gapped alignment search
Mega BLAST at NIH: Ultra-fast BLAST for highly similar sequences
Saturated BLAST at UC San Diego: detects distant homology (kind of similar to PSI-BLAST)
EASY at University of Manchester: Expert Analysis SYstem (OUTDATED)

Primer to database searches:

1. Always compare protein sequences if the genes encode proteins.

Matches that are more than 50% identical in a 20- to 40-amino acid region occur frequently by chance and do not indicate homology.

"Is my query sequence homologous to anything in the database?"

1. Whenever possible, compare at the amino acid level, rather than the nucleotide level.

5. Consider searches with different gap penalties and other scoring matrices.

the E value is only the first step in characterizing a sequence relationship. Once one has confidence that the sequences are homologous, one should look at the sequence alignments and percent identities

Whereas homology implies common three-dimensional structure, homology need not imply common function.

Sequence formats:

Example of a SwissProt entry in FASTA format

Similarity matrices:

PAM250: Percent Accepted Mutation-Matrix (Dayhoff et al., 1978)
BLOSUM62: Blocks Substitution-Matrix (Henikoff and Henikoff, 1992)
More Similarity matrices
- HELP to proper use of similarity matrices

Sequence examples:

Database of amino acid sequences via Entrez
Database of nucleotide sequences via Entrez
Bacteriorhodopsin from Halobacterium salinarium: Seven-helix bundle protein
TonB from Escherichia coli: Protein with N-terminal transmembrane helix
Maltose-binding protein from Escherichia coli: Protein with N-terminal signal sequence for secretion into the periplasmic space
OmpA from Escherichia coli: Two-domain protein: the N-terminal protein domain is embedded into the outer membrane in form of an 8-stranded β barrel while the C-terminal protein domain is found in the periplasmic space

Abbreviations:

BALSA: Bayesian Algorithm for Local Sequence Alignment
BEAUTY: BLAST Enhanced Alignment Utility
BLAST: Basic Local Alignment Search Tool
BLOSUM: Blocks Substitution-Matrix
EASY: Expert Analysis SYstem
PAM: Percent Accepted Mutation
PHI-BLAST: Pattern-Hit Initiated BLAST
PSI-BLAST: Position-Specific Iterated BLAST
RPS-BLAST: Reverse Position Specific BLAST
WU: Washington University
BLASTN: Nucleotide sequence versus nucleotide sequence database
BLASTP: Amino acid sequence versus amino acid sequence database
BLASTX: Translated nucleotide sequence versus amino acid sequence database
TBLASTN: Amino acid sequence versus translated nucleotide sequence database
TBLASTX: Translated nucleotide sequence versus translated nucleotide sequence database

References

Latest update: November 19, 2009

Ralf Koebnik
Institut de recherche pour le dèveloppement
UMR 5096, CNRS-UP-IRD
911, Avenue Agropolis, BP 64501
34394 Montpellier, Cedex 5
FRANCE
Phone: +33 (0)4 67 41 62 28
Fax: +33 (0)4 67 41 61 81
Email: koebnik(at)gmx.de
Please replace (at) by @.

Back to main page