More operations on BLAST

  • Environment (OS): Linux-2.6.32-358.14.1.el6.i686, CentOS-6.4

DataSets and Default parameters


  • searching database: swissport (captured from the resource of the textbook and download it on http://bio.biomedicine.gu.se/gb/7/swissprot)

  • query sequence: ABL1_HUMAN (captured from database swissport with command blastdbcmd and entry)

  • the default parameters in operations of blastp, not the default in the following operations

Parameter Value Parameter Value
Gap opening penalty 11 Gap extension penalty 1
Nucleic match n/a Nucleic mismatch n/a
Expectation value 10.0 (real) Word size 3
Max scores 25 Max alignments 15
Query filter SEG Query genetic code n/a
Matrix BLOSUM62
  • usage of blastp parameters and its corresponding values
Parameters Values Descriptions
-matrix BLOSUM: 45, 62, 80 -
-word_size 3, 4, 5, 6, 7 -
-gapopen
(-gapextend 1)
13 1、21 not suitable for BLOSUM62
-gapextend
(-gapopen 11)
2 3 not suitable for BLOSUM62
-gapopen v1
–gapextend v2
(v1,v2) = (6,2), (9,2), (10,1), (12,1) text marked on the left is different from the default

Recording table and command line operations in the linux environment


  • command line operations in the linux environment:

    1. create database :
      makeblastdb -in ./swissprot -dbtype prot -parse_seqids
      
    2. capture query sequence :
      blastdbcmd -entry ABL1_HUMAN -db ./swissprot > ABL1.fa
      
    3. default result :
      blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -max_target_seqs 434882 > default.txt
      
    4. matrix changed to BLOSUM 45, 62, 80 :
      # -matrix BLOSUM45 (BLOSUM62, BLOSUM80) 
      blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -matrix BLOSUM62 -max_target_seqs 434882 > BL62.txt
      
    5. word_size changes :
      # -word_size 3 (4, 5, 6, 7) 
      blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -word_size 7 -max_target_seqs 434882 > ws7.txt
      
    6. –gapopen changes
      # -gapopen 11 (-gapextend 1 in this moment) 
      blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -gapopen 11 -max_target_seqs 434882 > go11.txt
      
    7. –gapextend changes
      # -gapextend 1 (2, -gapopen 11 in this moment) 
      blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -gapextend 2 -max_target_seqs 434882 > ge2.txt
      
    8. both –gapopen and –gapextend changes
      blastp -db ./swissprot -query ./ABL1.fa -outfmt 6 -evalue 0.01 -gapopen 12 -gapextend 1 -max_target_seqs 434882 > go12ge1.txt
      
  • Recording table

parameters default BLOSUM BLOSUM BLOSUM ws ws ws ws ws
values default 45 62 80 3 4 5 6 7
hits 3261 3188 3261 2985 3261 3291 3298 3300 3296
  1. ws : -word_size
parameters default -gapopen -gapextend (v1,v2) (v1,v2) (v1,v2) (v1,v2)
values default 13 2 (6,2) (10,1) (12,1) (9,2)
hits 3261 3228 3219 3208 3255 3267 3289
  1. (v1,v2) : -gapopen v1 –gapextend v2

Discussions


  • BLOSUM: BLOSUM is the matrix used to score alignment between two evolutionally divergent protein sequences. The BLOSUM45 is used for scoring alignments between more divergent sequences. On the contrary, the BLOSUM82 is used for scoring alignments between less divergent sequences.

  • word size: It means the length of the initial word matched between the database and the query sequence.

  • gapopen and gapextend: Both these parameters consist of the gap-cost, gapopen means the cost of the open gap and gapextend means the gap extending to fit the better scoring, which scoring/penalty is different from the gapopen with a single gap at the same moment.

  • The meanings of the results above are hard to discuss right away after the amount of the sequences fit is known. There are several reasons remained to figure out, such as false negatives and false positives. Instead further handling the data is required.

results matching ""

    No results matching ""