Entrez

Explore and Introduce NCBI databases

The query system to NCBI databases, known as Entrez, in web interface is (http://www.ncbi.nlm.nih.gov) and in command-line is Entrez Programming Utilities, eUtils.
The following figure is NCBI databases and their related name specific to eUtils.

Entrez database	Primary ID	eUtils database name
nucleotide	GI number	nucleotide
nucleotide	GI number	-nuccore
nucleotide	GI number	-nucest
nucleotide	GI number	-nucgss
Protein	GI number	Protein
Structure	MMDB ID	structure
Domains	PSSM-ID	cdd
OMIM	MIM number	omim
PubMed	PMID	pubmed
SNP	SNP ID	snp
Taxonomy	TAXID	taxonomy

NCBI query syntax

Searching format obeys the rule as the following:

 term1[field1] Op term2[field2] Op term3[field3] ...

Op : [AND|OR|NOT]
field : search in specific items as the following figure.

The 'Qualifier' item in the following figure is used for [field] in the NCBI query syntax.

Search field	Qualifier
Accession	[ACCN]
All fields	[ALL]
Title	[TITL]
Author Name	[AUTH]
Feature Key	[FKEY]
Journal name	[JOUR]
Modification date	[MDAT]
Organism	[ORGN]
Properties	[PROP]
Publication date	[PDAT]
Sequence length	[SLEN]
Uid	[UID]

All available for uncleotide and protein sequences, expect for the Feature key.
Feature key is not applied for protein database and UID (for PubMed).

Example.1

# Search all fields and contain the words 'brca1', 'human' and 'cancer'. 'AND' is used even through the word is miss.
brca1 AND human AND cancer

# the same result as the following
brca1 human cancer

Example.2

# This example means search brac1 in the title and specific to human organism.
brca1[title] AND human[orgn]

Example.3

'factor ix'[titl] AND 'mus musculus'[orgn] AND 300[slen]:500[slen]

Entrez programming utilities: eUtils

The eUtils are a set of seven programs providing the interface for Entrez query in NCBI database systems. The fact is that searching operations with terms and parameters run on the NCBI sever.

Program	Function
Einfo	Provide (1) the amount of indexed records in each field of a specific database; (2) the data last update of the database; (3) links from the databse to other Entrez databases.
EGQuery	Respond the number of records matching the query in each Entrez database.
ESearch	Provide the last of UIDs matching the query in the specific database as well as term translations of the query.
ESummary	Provide the list of UIDs and their corresponding document summaries.
EPost	Allow users to store sets of UIDs on the history server and further a corresponding query key and specific web environment returned.
EFetch	Provide a list of UIDs and their corresponding data records.
ELink	Provide a list of UIDs in specific database and further a list of related IDs in the same database or a list linked IDs in another Entrez database.

NCBI query parameters

All eUtil scripts have the following basic URL:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils

When it comes to applying EFetch, the script is like the following:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?key1=value1&key2=value2&key3=value3&...

Example.1:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[journal]+AND+breast+cancer+AND+2008[pdat]

And the following content is the result from the execution on the browser when typing the above example website.

<eSearchResult>
  <Count>6</Count>
  <RetMax>6</RetMax>
  <RetStart>0</RetStart>
  <IdList>
    <Id>19008416</Id>
    <Id>18927361</Id>
    <Id>18787170</Id>
    <Id>18487186</Id>
    <Id>18239126</Id>
    <Id>18239125</Id>
  </IdList>
  <TranslationSet>
    <Translation>
      <From>science[journal]</From>
      <To>
      "Science"[Journal] OR "Science (80- )"[Journal] OR "J Zhejiang Univ Sci"[Journal]
      </To>
    </Translation>
    <Translation>
      <From>breast cancer</From>
      <To>
      "breast neoplasms"[MeSH Terms] OR ("breast"[All Fields] AND "neoplasms"[All Fields]) OR "breast neoplasms"[All Fields] OR ("breast"[All Fields] AND "cancer"[All Fields]) OR "breast cancer"[All Fields]
      </To>
    </Translation>
  </TranslationSet>
  <TranslationStack>
    <TermSet>
      <Term>"Science"[Journal]</Term>
      <Field>Journal</Field>
      <Count>169143</Count>
      <Explode>N</Explode>
    </TermSet>
    <TermSet>
      <Term>"Science (80- )"[Journal]</Term>
      <Field>Journal</Field>
      <Count>10</Count>
      <Explode>N</Explode>
    </TermSet>
    <OP>OR</OP>
    <TermSet>
      <Term>"J Zhejiang Univ Sci"[Journal]</Term>
      <Field>Journal</Field>
      <Count>364</Count>
      <Explode>N</Explode>
    </TermSet>
    <OP>OR</OP>
    <OP>GROUP</OP>
    <TermSet>
      <Term>"breast neoplasms"[MeSH Terms]</Term>
      <Field>MeSH Terms</Field>
      <Count>239359</Count>
      <Explode>Y</Explode>
    </TermSet>
    <TermSet>
      <Term>"breast"[All Fields]</Term>
      <Field>All Fields</Field>
      <Count>405108</Count>
      <Explode>N</Explode>
    </TermSet>
    <TermSet>
      <Term>"neoplasms"[All Fields]</Term>
      <Field>All Fields</Field>
      <Count>2240697</Count>
      <Explode>N</Explode>
    </TermSet>
    <OP>AND</OP>
    <OP>GROUP</OP>
    <OP>OR</OP>
    <TermSet>
      <Term>"breast neoplasms"[All Fields]</Term>
      <Field>All Fields</Field>
      <Count>239433</Count>
      <Explode>N</Explode>
    </TermSet>
    <OP>OR</OP>
    <TermSet>
      <Term>"breast"[All Fields]</Term>
      <Field>All Fields</Field>
      <Count>405108</Count>
      <Explode>N</Explode>
    </TermSet>
    <TermSet>
      <Term>"cancer"[All Fields]</Term>
      <Field>All Fields</Field>
      <Count>1701091</Count>
      <Explode>N</Explode>
    </TermSet>
    <OP>AND</OP>
    <OP>GROUP</OP>
    <OP>OR</OP>
    <TermSet>
      <Term>"breast cancer"[All Fields]</Term>
      <Field>All Fields</Field>
      <Count>206821</Count>
      <Explode>N</Explode>
    </TermSet>
    <OP>OR</OP>
    <OP>GROUP</OP>
    <OP>AND</OP>
    <TermSet>
      <Term>2008[pdat]</Term>
      <Field>pdat</Field>
      <Count>833635</Count>
      <Explode>N</Explode>
    </TermSet>
    <OP>AND</OP>
  </TranslationStack>
  <QueryTranslation>
  ("Science"[Journal] OR "Science (80- )"[Journal] OR "J Zhejiang Univ Sci"[Journal]) AND ("breast neoplasms"[MeSH Terms] OR ("breast"[All Fields] AND "neoplasms"[All Fields]) OR "breast neoplasms"[All Fields] OR ("breast"[All Fields] AND "cancer"[All Fields]) OR "breast cancer"[All Fields]) AND 2008[pdat]
  </QueryTranslation>
</eSearchResult>

We would get a XML file as the returned information. And this XML file would be further used for information extraction by regular expression in the Perl script. The EFetch, ESearch and ELink are examples. In other words, the XML file would be showed and further the Perl script could be used for 'grep' the information in it.

Entrez

Entrez

Explore and Introduce NCBI databases

NCBI query syntax

Entrez programming utilities: eUtils

NCBI query parameters

results matching ""

No results matching ""