The primary structure section of a PDB file contains the sequence of residues in each chain of the macromolecule. Embedded in these records are chain identifiers and sequence numbers that allow other records to link into the sequence.
Overview
The DBREF record provides cross-reference links between PDB sequences and the corresponding database entry or entries. A cross reference to the sequence database is mandatory for each peptide chain with a length greater than ten (10) residues. For nucleic acid entries a DBREF record pointing to the Nucleic Acid Database (NDB) is mandatory when the corresponding entry exists in NDB.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION -------------------------------------------------------------------------------- 1 - 6 Record name "DBREF " 8 - 11 IDcode idCode ID code of this entry. 13 Character chainID Chain identifier. 15 - 18 Integer seqBegin Initial sequence number of the PDB sequence segment. 19 AChar insertBegin Initial insertion code of the PDB sequence segment. 21 - 24 Integer seqEnd Ending sequence number of the PDB sequence segment. 25 AChar insertEnd Ending insertion code of the PDB sequence segment. 27 - 32 LString database Sequence database name. "PDB" when a corresponding sequence database entry has not been identified. 34 - 41 LString dbAccession Sequence database accession code. For GenBank entries, this is the NCBI gi number. 43 - 54 LString dbIdCode Sequence database identification code. For GenBank entries, this is the accession code. 56 - 60 Integer dbseqBegin Initial sequence number of the database seqment. 61 AChar idbnsBeg Insertion code of initial residue of the segment, if PDB is the reference. 63 - 67 Integer dbseqEnd Ending sequence number of the database segment. 68 AChar dbinsEnd Insertion code of the ending residue of the segment, if PDB is the reference.
Details
* PDB entries contain multi-chain molecules with sequences that may be wild type, variant, or synthetic. Sequences may also have been modified through site-directed mutagenesis experiments (engineered). A number of PDB entries report structures of domains cleaved from larger molecules.
* The DBREF record was designed to account for these differences by providing explicit correlations between contiguous segments of sequences as given in the PDB ATOM records and the sequence database entry. Several cases are easily represented by means of pointers between the databases using DBREF. PDB entries containing heteropolymers are linked to different sequence database entries. In some cases, such as those PDB entries containing immunoglobulin Fab fragments, each chain is linked to two different SWISS-PROT, PIR, and/or GenBank entries. This facility is needed because these databases represent sequences for the various immunoglobulin domains as separate entries. DBREF also is able to represent molecules engineered by altering the gene (fusing genes, altering sequences, creating chimeras, or circularly permuting sequences). This design has the additional advantage that it will be possible to construct pointers to other relevant databases such as the Nucleic Acid Database, BioMagResBank, and databases describing sequence motifs (e.g., PROSITE, BLOCKS).
* Database names and their abbreviations as used on DBREF records.
Database name database (code in columns 27 - 32) --------------------------------------------------------------------------- BioMagResBank BMRB BLOCKS BLOCKS European Molecular Biology Laboratory EMBL GenBank GB Genome Data Base GDB Nucleic Acid Database NDB PROSITE PROSIT Protein Data Bank PDB Protein Identification Resource PIR SWISS-PROT SWS TREMBL TREMBL
* When no sequence numbers are given (columns 15 - 25 and 56 - 68), then the mapping is between database entries rather than segments within an entry. For example, this is normally used to point to the related NDB entry.
* DBREF records present sequence correlations between PDB ATOM records and corresponding PIR, GenBank, or SWISS-PROT, etc. entries.
* PDB does not guarantee that all possible references to the listed databases will be provided. In most cases, only one reference to a sequence database will be provided.
* PDB entries containing chains for which residues are missing primarily due to disorder contain several DBREF records, each linking an observed sequence segment to a sequence database entry.
* If no reference is found in the sequence databases, then the PDB entry itself is given as the reference.
* For nucleic acid entries a DBREF record pointing to the Nucleic Acid Database (NDB) is mandatory when the corresponding entry exists in NDB.
* Selection of the appropriate sequence database entry or entries to be linked to a PDB entry is done on the basis of the sequence and its biological source. Questions on entry assignment that may arise are resolved by consultation with database staff.
Verification/Validation/Value Authority Control
The sequence database entry found during PDB's search is compared to that provided by the depositor and any differences are resolved or annotated.
In most cases, only one reference to a sequence database will be provided. PDB does not guarantee that all possible references to the listed databases will be provided.
Relationships to Other Record Types
DBREF represents the sequence as found in ATOM and HETATM records.
Example
1 2 3 4 5 6 7 1234567890123456789012345678901234567890123456789012345678901234567890 DBREF 1ABC B 1B 36 PDB 1ABC 1ABC 1B 36 DBREF 3AKY 3 220 SWS P07170 KAD1_YEAST 5 222 DBREF 1HAN 2 288 GB 397884 X66122 1 287 DBREF 3HSV A 1 92 SWS P22121 HSF_KLULA 193 284 DBREF 3HSV B 1 92 SWS P22121 HSF_KLULA 193 284 DBREF 1ARL 1 307 SWS P00730 CBPA_BOVIN 111 417 DBREF 249D A 1 12 NDB BDL070 BDL070 1 12 DBREF 249D B 13 24 NDB BDL070 BDL070 13 24 DBREF 249D C 26 36 NDB BDL070 BDL070 26 36 DBREF 249D D 37 48 NDB BDL070 BDL070 37 48