PDB Format Guide, Part 36

SEQRES

Overview

SEQRES records contain the amino acid or nucleic acid sequence of residues in each chain of the macromolecule that was studied.

Record Format

COLUMNS        DATA TYPE       FIELD         DEFINITION
---------------------------------------------------------------------------------
 1 -  6        Record name     "SEQRES"

 9 - 10        Integer         serNum        Serial number of the SEQRES record
                                             for the current chain.  Starts at 1
                                             and increments by one each line.
                                             Reset to 1 for each chain.

12             Character       chainID       Chain identifier.  This may be any
                                             single legal character, including a
                                             blank which is used if there is
                                             only one chain.

14 - 17        Integer         numRes        Number of residues in the chain.
                                             This value is repeated on every
                                             record.

20 - 22        Residue name    resName       Residue name.

24 - 26        Residue name    resName       Residue name.

28 - 30        Residue name    resName       Residue name.

32 - 34        Residue name    resName       Residue name.

36 - 38        Residue name    resName       Residue name.

40 - 42        Residue name    resName       Residue name.

44 - 46        Residue name    resName       Residue name.

48 - 50        Residue name    resName       Residue name.

52 - 54        Residue name    resName       Residue name.

56 - 58        Residue name    resName       Residue name.

60 - 62        Residue name    resName       Residue name.

64 - 66        Residue name    resName       Residue name.

68 - 70        Residue name    resName       Residue name.

Details

* PDB entries use the three-letter abbreviation for amino acid names and the one letter code for nucleic acids.

* In the case of non-standard groups, a hetID of up to three (3) alphanumeric characters is used. Common HET names appear in the HET dictionary.

* Each covalently contiguous sequence of residues (connected via the "backbone" atoms) is represented as an individual chain.

* Heterogens which are integrated into the backbone of the chain are listed as being part of the chain and are included in the SEQRES records for that chain.

* Each set of SEQRES records and each HET group is assigned a component number. The component number is assigned serially beginning with 1 for the first set of SEQRES records. This number is given explicitly in the FORMUL record, but only implicitly in the SEQRES record.

* The SEQRES records must list residues present in the molecule studied, even if the coordinates are not present.

* C- and N-terminus residues for which no coordinates are provided due to disorder must be listed on SEQRES.

* All occurrences of standard amino or nucleic acid residues (ATOM records) must be listed on a SEQRES record. This implies that a numRes of 1 is valid.

* No distinction is made between ribo- and deoxyribonucleotides in the SEQRES records. These residues are identified with the same residue name (i.e., A, C, G, T, U, I).

* If the entire residue sequence is unknown, the serNum in column 10 is "0", the number of residues thought to comprise the molecule is entered as numRes in columns 14 - 17, and resName in columns 20 - 22 is "UNK".

* In case of microheterogeneity, only one of the sequences is presented. A REMARK is generated to explain this and a SEQADV is also generated.

Verification/Validation/Value Authority Control

The residues presented on the SEQRES records must agree with those found in the ATOM records.

The SEQRES records are checked by PDB using the sequence databases and information provided by the depositor.

SEQRES is compared to the ATOM records during processing, and both are checked against the sequence database. All discrepancies are either resolved or annotated in the entry.

Relationships to Other Record Types

The residues presented on the SEQRES records must agree with those found in the ATOM records. DBREF refers to the corresponding entry in the sequence databases. SEQADV lists all discrepancies between the entry's sequence for which there are coordinates and that referenced in the sequence database. MODRES describes modifications to a standard residue.

Example

         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890
SEQRES   1 A   21  GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU
SEQRES   2 A   21  TYR GLN LEU GLU ASN TYR CYS ASN
SEQRES   1 B   30  PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU
SEQRES   2 B   30  ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR
SEQRES   3 B   30  THR PRO LYS ALA
SEQRES   1 C   21  GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU
SEQRES   2 C   21  TYR GLN LEU GLU ASN TYR CYS ASN
SEQRES   1 D   30  PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU
SEQRES   2 D   30  ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR
SEQRES   3 D   30  THR PRO LYS ALA

Known Problems

Polysaccharides do not lend themselves to being represented in SEQRES.

There is no mechanism provided to describe sequence runs when the exact ordering of the sequence is not known.

For cyclic peptides, PDB arbitrarily assigns a residue as the N-terminus.

For microheterogeneity only one of the possible residues in a given position is provided in SEQRES.

No distinction is made between ribo- and deoxyribonucleotides in the SEQRES records. These residues are identified with the same residue name (i.e., A, C, G, T, U).