Overview
SEQRES records contain the amino acid or nucleic acid sequence of residues in each chain of the macromolecule that was studied.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION --------------------------------------------------------------------------------- 1 - 6 Record name "SEQRES" 9 - 10 Integer serNum Serial number of the SEQRES record for the current chain. Starts at 1 and increments by one each line. Reset to 1 for each chain. 12 Character chainID Chain identifier. This may be any single legal character, including a blank which is used if there is only one chain. 14 - 17 Integer numRes Number of residues in the chain. This value is repeated on every record. 20 - 22 Residue name resName Residue name. 24 - 26 Residue name resName Residue name. 28 - 30 Residue name resName Residue name. 32 - 34 Residue name resName Residue name. 36 - 38 Residue name resName Residue name. 40 - 42 Residue name resName Residue name. 44 - 46 Residue name resName Residue name. 48 - 50 Residue name resName Residue name. 52 - 54 Residue name resName Residue name. 56 - 58 Residue name resName Residue name. 60 - 62 Residue name resName Residue name. 64 - 66 Residue name resName Residue name. 68 - 70 Residue name resName Residue name.
Details
* PDB entries use the three-letter abbreviation for amino acid names and the one letter code for nucleic acids.
* In the case of non-standard groups, a hetID of up to three (3) alphanumeric characters is used. Common HET names appear in the HET dictionary.
* Each covalently contiguous sequence of residues (connected via the "backbone" atoms) is represented as an individual chain.
* Heterogens which are integrated into the backbone of the chain are listed as being part of the chain and are included in the SEQRES records for that chain.
* Each set of SEQRES records and each HET group is assigned a component number. The component number is assigned serially beginning with 1 for the first set of SEQRES records. This number is given explicitly in the FORMUL record, but only implicitly in the SEQRES record.
* The SEQRES records must list residues present in the molecule studied, even if the coordinates are not present.
* C- and N-terminus residues for which no coordinates are provided due to disorder must be listed on SEQRES.
* All occurrences of standard amino or nucleic acid residues (ATOM records) must be listed on a SEQRES record. This implies that a numRes of 1 is valid.
* No distinction is made between ribo- and deoxyribonucleotides in the SEQRES records. These residues are identified with the same residue name (i.e., A, C, G, T, U, I).
* If the entire residue sequence is unknown, the serNum in column 10 is "0", the number of residues thought to comprise the molecule is entered as numRes in columns 14 - 17, and resName in columns 20 - 22 is "UNK".
* In case of microheterogeneity, only one of the sequences is presented. A REMARK is generated to explain this and a SEQADV is also generated.
Verification/Validation/Value Authority Control
The residues presented on the SEQRES records must agree with those found in the ATOM records.
The SEQRES records are checked by PDB using the sequence databases and information provided by the depositor.
SEQRES is compared to the ATOM records during processing, and both are checked against the sequence database. All discrepancies are either resolved or annotated in the entry.
Relationships to Other Record Types
The residues presented on the SEQRES records must agree with those found in the ATOM records. DBREF refers to the corresponding entry in the sequence databases. SEQADV lists all discrepancies between the entry's sequence for which there are coordinates and that referenced in the sequence database. MODRES describes modifications to a standard residue.
Example
1 2 3 4 5 6 7 1234567890123456789012345678901234567890123456789012345678901234567890 SEQRES 1 A 21 GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU SEQRES 2 A 21 TYR GLN LEU GLU ASN TYR CYS ASN SEQRES 1 B 30 PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU SEQRES 2 B 30 ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR SEQRES 3 B 30 THR PRO LYS ALA SEQRES 1 C 21 GLY ILE VAL GLU GLN CYS CYS THR SER ILE CYS SER LEU SEQRES 2 C 21 TYR GLN LEU GLU ASN TYR CYS ASN SEQRES 1 D 30 PHE VAL ASN GLN HIS LEU CYS GLY SER HIS LEU VAL GLU SEQRES 2 D 30 ALA LEU TYR LEU VAL CYS GLY GLU ARG GLY PHE PHE TYR SEQRES 3 D 30 THR PRO LYS ALA
Known Problems
Polysaccharides do not lend themselves to being represented in SEQRES.
There is no mechanism provided to describe sequence runs when the exact ordering of the sequence is not known.
For cyclic peptides, PDB arbitrarily assigns a residue as the N-terminus.
For microheterogeneity only one of the possible residues in a given position is provided in SEQRES.
No distinction is made between ribo- and deoxyribonucleotides in the SEQRES records. These residues are identified with the same residue name (i.e., A, C, G, T, U).