PDB Format Guide, Part 34

3. Primary Structure Section

The primary structure section of a PDB file contains the sequence of residues in each chain of the macromolecule. Embedded in these records are chain identifiers and sequence numbers that allow other records to link into the sequence.

DBREF

Overview

The DBREF record provides cross-reference links between PDB sequences and the corresponding database entry or entries. A cross reference to the sequence database is mandatory for each peptide chain with a length greater than ten (10) residues. For nucleic acid entries a DBREF record pointing to the Nucleic Acid Database (NDB) is mandatory when the corresponding entry exists in NDB.

Record Format

COLUMNS       DATA TYPE       FIELD          DEFINITION
--------------------------------------------------------------------------------
 1 -  6       Record name     "DBREF "

 8 - 11       IDcode          idCode         ID code of this entry.

13            Character       chainID        Chain identifier.

15 - 18       Integer         seqBegin       Initial sequence number of the PDB
                                             sequence segment.

19            AChar           insertBegin    Initial insertion code of the PDB
                                             sequence segment.

21 - 24       Integer         seqEnd         Ending sequence number of the PDB
                                             sequence segment.

25            AChar           insertEnd      Ending insertion code of the PDB
                                             sequence segment.

27 - 32       LString         database       Sequence database name.  "PDB" when
                                             a corresponding sequence database
                                             entry has not been identified.

34 - 41       LString         dbAccession    Sequence database accession code.
                                             For GenBank entries, this is the
                                             NCBI gi number.

43 - 54       LString         dbIdCode       Sequence database identification
                                             code.  For GenBank entries, this is
                                             the accession code.

56 - 60       Integer         dbseqBegin     Initial sequence number of the
                                             database seqment.

61            AChar           idbnsBeg       Insertion code of initial residue
                                             of the segment, if PDB is the
                                             reference.

63 - 67       Integer         dbseqEnd       Ending sequence number of the
                                             database segment.

68            AChar           dbinsEnd       Insertion code of the ending
                                             residue of the segment, if PDB is
                                             the reference.

Details

* PDB entries contain multi-chain molecules with sequences that may be wild type, variant, or synthetic. Sequences may also have been modified through site-directed mutagenesis experiments (engineered). A number of PDB entries report structures of domains cleaved from larger molecules.

* The DBREF record was designed to account for these differences by providing explicit correlations between contiguous segments of sequences as given in the PDB ATOM records and the sequence database entry. Several cases are easily represented by means of pointers between the databases using DBREF. PDB entries containing heteropolymers are linked to different sequence database entries. In some cases, such as those PDB entries containing immunoglobulin Fab fragments, each chain is linked to two different SWISS-PROT, PIR, and/or GenBank entries. This facility is needed because these databases represent sequences for the various immunoglobulin domains as separate entries. DBREF also is able to represent molecules engineered by altering the gene (fusing genes, altering sequences, creating chimeras, or circularly permuting sequences). This design has the additional advantage that it will be possible to construct pointers to other relevant databases such as the Nucleic Acid Database, BioMagResBank, and databases describing sequence motifs (e.g., PROSITE, BLOCKS).

* Database names and their abbreviations as used on DBREF records.

   Database name                            database (code in columns 27 - 32)
   ---------------------------------------------------------------------------
   BioMagResBank                            BMRB
   BLOCKS                                   BLOCKS
   European Molecular Biology Laboratory    EMBL
   GenBank                                  GB
   Genome Data Base                         GDB
   Nucleic Acid Database                    NDB
   PROSITE                                  PROSIT
   Protein Data Bank                        PDB
   Protein Identification Resource          PIR
   SWISS-PROT                               SWS
   TREMBL                                   TREMBL

* When no sequence numbers are given (columns 15 - 25 and 56 - 68), then the mapping is between database entries rather than segments within an entry. For example, this is normally used to point to the related NDB entry.

* DBREF records present sequence correlations between PDB ATOM records and corresponding PIR, GenBank, or SWISS-PROT, etc. entries.

* PDB does not guarantee that all possible references to the listed databases will be provided. In most cases, only one reference to a sequence database will be provided.

* PDB entries containing chains for which residues are missing primarily due to disorder contain several DBREF records, each linking an observed sequence segment to a sequence database entry.

* If no reference is found in the sequence databases, then the PDB entry itself is given as the reference.

* For nucleic acid entries a DBREF record pointing to the Nucleic Acid Database (NDB) is mandatory when the corresponding entry exists in NDB.

* Selection of the appropriate sequence database entry or entries to be linked to a PDB entry is done on the basis of the sequence and its biological source. Questions on entry assignment that may arise are resolved by consultation with database staff.

Verification/Validation/Value Authority Control

The sequence database entry found during PDB's search is compared to that provided by the depositor and any differences are resolved or annotated.

In most cases, only one reference to a sequence database will be provided. PDB does not guarantee that all possible references to the listed databases will be provided.

Relationships to Other Record Types

DBREF represents the sequence as found in ATOM and HETATM records.

Example

         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890
DBREF  1ABC B    1B   36  PDB    1ABC     1ABC             1B    36

DBREF  3AKY      3   220  SWS    P07170   KAD1_YEAST       5    222

DBREF  1HAN      2   288  GB     397884   X66122           1    287

DBREF  3HSV A    1    92  SWS    P22121   HSF_KLULA      193    284
DBREF  3HSV B    1    92  SWS    P22121   HSF_KLULA      193    284

DBREF  1ARL      1   307  SWS    P00730   CBPA_BOVIN     111    417

DBREF  249D A    1    12  NDB    BDL070   BDL070           1     12
DBREF  249D B   13    24  NDB    BDL070   BDL070          13     24
DBREF  249D C   26    36  NDB    BDL070   BDL070          26     36
DBREF  249D D   37    48  NDB    BDL070   BDL070          37     48