More operations on BLAST
- Environment (OS): Linux-2.6.32-358.14.1.el6.i686, CentOS-6.4
DataSets and Default parameters
searching database: swissport (captured from the resource of the textbook and download it on http://bio.biomedicine.gu.se/gb/7/swissprot)
query sequence: ABL1_HUMAN (captured from database swissport with command blastdbcmd and entry)
the default parameters in operations of blastp, not the default in the following operations
Parameter | Value | Parameter | Value |
---|---|---|---|
Gap opening penalty | 11 | Gap extension penalty | 1 |
Nucleic match | n/a | Nucleic mismatch | n/a |
Expectation value | 10.0 (real) | Word size | 3 |
Max scores | 25 | Max alignments | 15 |
Query filter | SEG | Query genetic code | n/a |
Matrix | BLOSUM62 |
- usage of blastp parameters and its corresponding values
Parameters | Values | Descriptions |
---|---|---|
-matrix | BLOSUM: 45, 62, 80 | - |
-word_size | 3, 4, 5, 6, 7 | - |
-gapopen (-gapextend 1) |
13 | 1、21 not suitable for BLOSUM62 |
-gapextend (-gapopen 11) |
2 | 3 not suitable for BLOSUM62 |
-gapopen v1 –gapextend v2 |
(v1,v2) = (6,2), (9,2), (10,1), (12,1) | text marked on the left is different from the default |
Recording table and command line operations in the linux environment
command line operations in the linux environment:
- create database :
makeblastdb -in ./swissprot -dbtype prot -parse_seqids
- capture query sequence :
blastdbcmd -entry ABL1_HUMAN -db ./swissprot > ABL1.fa
- default result :
blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -max_target_seqs 434882 > default.txt
- matrix changed to BLOSUM 45, 62, 80 :
# -matrix BLOSUM45 (BLOSUM62, BLOSUM80) blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -matrix BLOSUM62 -max_target_seqs 434882 > BL62.txt
- word_size changes :
# -word_size 3 (4, 5, 6, 7) blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -word_size 7 -max_target_seqs 434882 > ws7.txt
- –gapopen changes
# -gapopen 11 (-gapextend 1 in this moment) blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -gapopen 11 -max_target_seqs 434882 > go11.txt
- –gapextend changes
# -gapextend 1 (2, -gapopen 11 in this moment) blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -gapextend 2 -max_target_seqs 434882 > ge2.txt
- both –gapopen and –gapextend changes
blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -gapopen 12 -gapextend 1 -max_target_seqs 434882 > go12ge1.txt
- create database :
Recording table
parameters | default | BLOSUM | BLOSUM | BLOSUM | ws | ws | ws | ws | ws |
---|---|---|---|---|---|---|---|---|---|
values | default | 45 | 62 | 80 | 3 | 4 | 5 | 6 | 7 |
hits | 3261 | 3188 | 3261 | 2985 | 3261 | 3291 | 3298 | 3300 | 3296 |
- ws : -word_size
parameters | default | -gapopen | -gapextend | (v1,v2) | (v1,v2) | (v1,v2) | (v1,v2) |
---|---|---|---|---|---|---|---|
values | default | 13 | 2 | (6,2) | (10,1) | (12,1) | (9,2) |
hits | 3261 | 3228 | 3219 | 3208 | 3255 | 3267 | 3289 |
- (v1,v2) : -gapopen v1 –gapextend v2
Discussions
BLOSUM: BLOSUM is the matrix used to score alignment between two evolutionally divergent protein sequences. The BLOSUM45 is used for scoring alignments between more divergent sequences. On the contrary, the BLOSUM82 is used for scoring alignments between less divergent sequences.
word size: It means the length of the initial word matched between the database and the query sequence.
gapopen and gapextend: Both these parameters consist of the gap-cost, gapopen means the cost of the open gap and gapextend means the gap extending to fit the better scoring, which scoring/penalty is different from the gapopen with a single gap at the same moment.
The meanings of the results above are hard to discuss right away after the amount of the sequences fit is known. There are several reasons remained to figure out, such as false negatives and false positives. Instead further handling the data is required.