Entrez
Explore and Introduce NCBI databases
The query system to NCBI databases, known as Entrez, in web interface is (http://www.ncbi.nlm.nih.gov) and in command-line is Entrez Programming Utilities, eUtils.
The following figure is NCBI databases and their related name specific to eUtils.
Entrez database | Primary ID | eUtils database name |
---|---|---|
nucleotide | GI number | nucleotide |
nucleotide | GI number | -nuccore |
nucleotide | GI number | -nucest |
nucleotide | GI number | -nucgss |
Protein | GI number | Protein |
Structure | MMDB ID | structure |
Domains | PSSM-ID | cdd |
OMIM | MIM number | omim |
PubMed | PMID | pubmed |
SNP | SNP ID | snp |
Taxonomy | TAXID | taxonomy |
NCBI query syntax
- Searching format obeys the rule as the following:
term1[field1] Op term2[field2] Op term3[field3] ...
- Op : [AND|OR|NOT]
- field : search in specific items as the following figure.
- The 'Qualifier' item in the following figure is used for [field] in the NCBI query syntax.
Search field | Qualifier |
---|---|
Accession | [ACCN] |
All fields | [ALL] |
Title | [TITL] |
Author Name | [AUTH] |
Feature Key | [FKEY] |
Journal name | [JOUR] |
Modification date | [MDAT] |
Organism | [ORGN] |
Properties | [PROP] |
Publication date | [PDAT] |
Sequence length | [SLEN] |
Uid | [UID] |
- All available for uncleotide and protein sequences, expect for the Feature key.
- Feature key is not applied for protein database and UID (for PubMed).
- Example.1
# Search all fields and contain the words 'brca1', 'human' and 'cancer'. 'AND' is used even through the word is miss.
brca1 AND human AND cancer
# the same result as the following
brca1 human cancer
- Example.2
# This example means search brac1 in the title and specific to human organism.
brca1[title] AND human[orgn]
- Example.3
'factor ix'[titl] AND 'mus musculus'[orgn] AND 300[slen]:500[slen]
Entrez programming utilities: eUtils
- The eUtils are a set of seven programs providing the interface for Entrez query in NCBI database systems. The fact is that searching operations with terms and parameters run on the NCBI sever.
Program | Function |
---|---|
Einfo | Provide (1) the amount of indexed records in each field of a specific database; (2) the data last update of the database; (3) links from the databse to other Entrez databases. |
EGQuery | Respond the number of records matching the query in each Entrez database. |
ESearch | Provide the last of UIDs matching the query in the specific database as well as term translations of the query. |
ESummary | Provide the list of UIDs and their corresponding document summaries. |
EPost | Allow users to store sets of UIDs on the history server and further a corresponding query key and specific web environment returned. |
EFetch | Provide a list of UIDs and their corresponding data records. |
ELink | Provide a list of UIDs in specific database and further a list of related IDs in the same database or a list linked IDs in another Entrez database. |
NCBI query parameters
- All eUtil scripts have the following basic URL:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils
When it comes to applying EFetch, the script is like the following:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?key1=value1&key2=value2&key3=value3&...
- Example.1:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[journal]+AND+breast+cancer+AND+2008[pdat]
And the following content is the result from the execution on the browser when typing the above example website.
<eSearchResult>
<Count>6</Count>
<RetMax>6</RetMax>
<RetStart>0</RetStart>
<IdList>
<Id>19008416</Id>
<Id>18927361</Id>
<Id>18787170</Id>
<Id>18487186</Id>
<Id>18239126</Id>
<Id>18239125</Id>
</IdList>
<TranslationSet>
<Translation>
<From>science[journal]</From>
<To>
"Science"[Journal] OR "Science (80- )"[Journal] OR "J Zhejiang Univ Sci"[Journal]
</To>
</Translation>
<Translation>
<From>breast cancer</From>
<To>
"breast neoplasms"[MeSH Terms] OR ("breast"[All Fields] AND "neoplasms"[All Fields]) OR "breast neoplasms"[All Fields] OR ("breast"[All Fields] AND "cancer"[All Fields]) OR "breast cancer"[All Fields]
</To>
</Translation>
</TranslationSet>
<TranslationStack>
<TermSet>
<Term>"Science"[Journal]</Term>
<Field>Journal</Field>
<Count>169143</Count>
<Explode>N</Explode>
</TermSet>
<TermSet>
<Term>"Science (80- )"[Journal]</Term>
<Field>Journal</Field>
<Count>10</Count>
<Explode>N</Explode>
</TermSet>
<OP>OR</OP>
<TermSet>
<Term>"J Zhejiang Univ Sci"[Journal]</Term>
<Field>Journal</Field>
<Count>364</Count>
<Explode>N</Explode>
</TermSet>
<OP>OR</OP>
<OP>GROUP</OP>
<TermSet>
<Term>"breast neoplasms"[MeSH Terms]</Term>
<Field>MeSH Terms</Field>
<Count>239359</Count>
<Explode>Y</Explode>
</TermSet>
<TermSet>
<Term>"breast"[All Fields]</Term>
<Field>All Fields</Field>
<Count>405108</Count>
<Explode>N</Explode>
</TermSet>
<TermSet>
<Term>"neoplasms"[All Fields]</Term>
<Field>All Fields</Field>
<Count>2240697</Count>
<Explode>N</Explode>
</TermSet>
<OP>AND</OP>
<OP>GROUP</OP>
<OP>OR</OP>
<TermSet>
<Term>"breast neoplasms"[All Fields]</Term>
<Field>All Fields</Field>
<Count>239433</Count>
<Explode>N</Explode>
</TermSet>
<OP>OR</OP>
<TermSet>
<Term>"breast"[All Fields]</Term>
<Field>All Fields</Field>
<Count>405108</Count>
<Explode>N</Explode>
</TermSet>
<TermSet>
<Term>"cancer"[All Fields]</Term>
<Field>All Fields</Field>
<Count>1701091</Count>
<Explode>N</Explode>
</TermSet>
<OP>AND</OP>
<OP>GROUP</OP>
<OP>OR</OP>
<TermSet>
<Term>"breast cancer"[All Fields]</Term>
<Field>All Fields</Field>
<Count>206821</Count>
<Explode>N</Explode>
</TermSet>
<OP>OR</OP>
<OP>GROUP</OP>
<OP>AND</OP>
<TermSet>
<Term>2008[pdat]</Term>
<Field>pdat</Field>
<Count>833635</Count>
<Explode>N</Explode>
</TermSet>
<OP>AND</OP>
</TranslationStack>
<QueryTranslation>
("Science"[Journal] OR "Science (80- )"[Journal] OR "J Zhejiang Univ Sci"[Journal]) AND ("breast neoplasms"[MeSH Terms] OR ("breast"[All Fields] AND "neoplasms"[All Fields]) OR "breast neoplasms"[All Fields] OR ("breast"[All Fields] AND "cancer"[All Fields]) OR "breast cancer"[All Fields]) AND 2008[pdat]
</QueryTranslation>
</eSearchResult>
- We would get a XML file as the returned information. And this XML file would be further used for information extraction by regular expression in the Perl script. The EFetch, ESearch and ELink are examples. In other words, the XML file would be showed and further the Perl script could be used for 'grep' the information in it.