This document addresses some questions about the Protein Data Bank (PDB) format that have been frequently posed by depositors and users over the past year. The information in this document has been gathered from the PDB Contents Guide document originally created at Brookhaven National Laboratory, a careful study of existing files, an RCSB Workshop held in October 1998, and discussion with many users of the data. The guidelines presented here are those used by the annotation staff at the RCSB-PDB.
This will be an evolving document. Questions, comments or suggestions about this document should be sent to format-faq@rcsb.rutgers.edu.
A: All residues in the crystal or in solution, including residues not present in the model (i.e., disordered, lacking electron density, cloning artifacts, HIS tags) are included in the SEQRES records.
Q: How are sequence differences between coordinate and SEQRES records handled (i.e., residues modeled as ALA, mutations, unknown residues)?
A: The residue names in the coordinate section should match the residue names in the SEQRES records, even if this involves having missing atoms. Residues modeled as ALA due to lack of side chain density are relabeled to match the SEQRES. The missing side chain atoms are listed in REMARK 470 of the PDB file.
Example:
SEQRES: MET GLU ASN SER ALA GLU PRO GLU GLN SER LEU VAL CYS GLN COORDS: MET GLU ASN SER ALA GLU PRO GLU GLN SER LEU ala CYS GLN ^^^
In the example above, residue VAL in the SEQRES is modeled as ALA in the coordinates. The residue name in the coordinates is changed from ALA to VAL to remove the conflict as shown below.
SEQRES: MET GLU ASN SER ALA GLU PRO GLU GLN SER LEU VAL CYS GLN COORDS: MET GLU ASN SER ALA GLU PRO GLU GLN SER LEU VAL CYS GLN ^^^
If there is a mutation from a natural sequence, the sequence including the mutation appears in the SEQRES records. It is irrelevant here what the wildtype sequence is. In the previous example, if the residue ALA was a point mutation, the sequence in the SEQRES records and the coordinates must match and are both labeled ALA.
If the identity of a residue is truly unknown, it is labeled UNK in both the coordinates and the SEQRES.
Q. What do the DBREF and SEQADV records represent?
A. The DBREF provides a cross-reference between the sequence listed in the SEQRES record and an entry in a sequence database (e.g., GenBank, or SWISS-PROT).
The SEQADV record identifies conflicts between sequence information given in the SEQRES and the sequence database entry given in the DBREF (for example, an engineered mutation). Residues missing in the coordinates are listed in remark 465, and are not listed in the SEQADV record.
Q. How is polymorphism/microheterogeneity described?
A. It is well recognized that this is not well represented by PDB format. The current practice is described here.
Although microheterogeneity does not present a problem within the coordinate records, it does introduce a difficulty in the specification of the sequence in the SEQRES records where only a single residue may be specified for each sequence position.
In cases where a single sequence position is modeled as different residues and these residues differ with respect to occupancy, then the residue with the higher occupancy is used to the define the SEQRES sequence.
If the different residue models cannot be distinguished by occupancy, then the SEQRES sequence is defined using the residue which matches the sequence obtained from the sequence database reference.
For example, residue 60 has two isoforms (SER and VAL) modeled with equal occupancies; however, residue SER matches the sequence database reference. Residue SER is listed in the SEQRES, since it matches the sequence database reference. Residue SER is listed in the coordinates as residue 60, conformation A. Residue VAL is listed next also as residue 60, conformation B.
REMARK 999 is also added to the entry to explain the presence of the isoforms. The combined occupancies of the two isoforms should be less than or equal to 1.00; however, the interpretation of occupancies can be more complicated if each of the isoforms is individually disordered (as in PDB 1EJG).
Q. How are three-letter residue names and chain identifiers for residue modifications and ligands assigned?
A. There are four common cases: covalently bound ligands, protein residue modifications, nucleotide modifications, metal coordination interactions (for example the interaction between iron and histidine in hemoglobin), and non-covalently bound ligands.
Covalently bound ligands:
A traditional distinction has been made between covalently bound ligands and small residue modifications. A bound ligand is defined as a modification with greater than 10 atoms including hydrogens.
Covalently bound ligands are assigned the chain identifier of the polymer chain to which the ligand is bound. The bonding between the ligand and the residue is specified in PDB LINK records. The ligand coordinates appear as HETATM records following either the TER record for the bound chain or after the TER record for the last polymer chain. The ligand is assigned a unique residue number within chain to which it is bound. The residue that binds the ligand retains its standard name in both coordinate and SEQRES records. Additional PDB records MODRES/HET/HETNAM/FORMUL and CONECT are provided to describe the ligand.
Protein residue modifications:
The residue including the modification is assigned a unique 3-letter code. This code is used to identify the modified residue in both the coordinate and SEQRES records. The coordinate records of the modified residue are labeled as HETATM records. These records are inserted in the correct sequence position in the atom list for the polymer chain in which the modification resides. Additional PDB records MODRES/HET/HETNAM/FORMUL and CONECT are provided to define the modification.
A distinction has been made between small residue modifications and covalently bound ligands. A residue modification is defined as a modification of 10 or less atoms including hydrogens.
Noteworthy exceptions to the above treatment of modified residues are the cases of acetylation of the N-terminus (residue ACE) and amidation of the C-terminus (residue NH2). Although these cases could be treated as residue modifications which would be assigned new 3-letter codes, these modifications have traditionally been treated as independent residues which appear in both the coordinates and SEQRES records.
Nucleotide modifications:
DNA and RNA
Modified DNA nucleotide names are prefixed with a "+" character. The coordinate records for the modified nucleotide are identified as HETATM records. The coordinate records for the modified portion of the nucleotide are inserted at the end of the DNA polymer chain after the TER record. These coordinates carry the residue number and chain id of the nucleotide they modify; however, they carry residue name corresponding to the particular modification (e.g. a methyl modification may be identified by the name CH3). The one-letter code is preceded by "+" character in SEQRES record. Additional PDB records MODRES/HET/HETNAM/FORMUL and CONECT are provided to define the modification.
tRNA
Nucleotide modifications in tRNA are handled in a manner analogous the protein residue modifications. Nucleotides in tRNA structures are specified using 3-letter codes.
Metal coordination:
Ligands interacting with a single chain of a macromolecule through metal coordination are assigned the chain identifier of the residue in the polymer chain to which the ligand is bound. Ligands interacting through metal coordination are assigned a unique residue number within the chain to which they bind.
Non-covalently bound ligands:
Ligands which are not covalently bound or metals which coordinate with multiple chains are not assigned PDB chain identifiers. Non-covalently bound polymer-like ligands which are composed of discrete units broken down in a chemically sensible manner, may be left as a grouping of multiple three-letter codes. The connections within the ligand are provided in LINK records.
Ligands covalently binding multiple polymer chains:
In the case of a ligand binding multiple chains through covalent bonds to each, the connecting group (regardless of size) is assigned a its own three letter code, but no chain ID. Polymer residues which bind the connecting ligand retain their standard names and these will appear in both coordinate and SEQRES records. MODRES/HET/HETNAM/FORMUL and CONECT records are provided to further define the modification.
Some Examples:
Example - Covalently bound ligand from PDB 1D2F:
In this example Vitamin B6 complex, three-letter code PLP, is covalently bound to LYS A NZ. Since the complex has more than 10 atoms, it retains its original three letter code, as does the LYS to which it is bound. MODRES/HET/HETNAM/FORMUL and CONECT records define the modification. The coordinates for the inhibitor appear in HETATM records following the TER card for that chain. The complex is given a unique residue number 400, but the same chain ID as the residue to which it is bound.
PDB file snippet from 1D2F:
HETNAM PLP PYRIDOXAL-5'-PHOSPHATE HETSYN PLP VITAMIN B6 COMPLEX FORMUL 3 PLP 2(C8 H10 N1 O6 P1)LINK NZ LYS A 233 C4A PLP A 400 ATOM 1607 N LYS A 233 -26.180 4.759 -18.385 1.00 41.53 N ATOM 1608 CA LYS A 233 -25.155 5.537 -17.709 1.00 43.81 C ATOM 1609 C LYS A 233 -24.504 4.737 -16.581 1.00 45.17 C ATOM 1610 O LYS A 233 -23.286 4.808 -16.389 1.00 46.70 O ATOM 1611 CB LYS A 233 -25.757 6.839 -17.173 1.00 43.94 C ATOM 1612 CG LYS A 233 -24.979 8.090 -17.576 1.00 42.51 C ATOM 1613 CD LYS A 233 -24.573 8.053 -19.047 1.00 41.96 C ATOM 1614 CE LYS A 233 -24.260 9.440 -19.598 1.00 44.08 C ATOM 1615 NZ LYS A 233 -25.512 10.211 -19.569 1.00 47.48 N ATOM 5715 CZ ARG B 390 -64.060 -24.212 -55.642 1.00 62.13 C ATOM 5716 NH1 ARG B 390 -63.244 -25.248 -55.485 1.00 61.76 N ATOM 5717 NH2 ARG B 390 -64.491 -23.898 -56.854 1.00 60.32 N ATOM 5718 OXT ARG B 390 -68.512 -25.126 -51.002 1.00 56.62 O TER 5719 ARG B 390 HETATM 5720 N1 PLP A 400 -29.825 12.803 -19.612 1.00 54.54 N HETATM 5721 C2 PLP A 400 -28.671 13.193 -18.934 1.00 54.96 C HETATM 5722 C2A PLP A 400 -28.901 14.286 -17.914 1.00 49.22 C HETATM 5723 C3 PLP A 400 -27.414 12.482 -19.296 1.00 53.32 C HETATM 5724 O3 PLP A 400 -26.339 12.929 -18.643 1.00 56.64 O HETATM 5725 C4 PLP A 400 -27.430 11.476 -20.261 1.00 53.62 C HETATM 5726 C4A PLP A 400 -26.181 10.773 -20.595 1.00 50.98 C HETATM 5727 C5 PLP A 400 -28.753 11.154 -20.908 1.00 53.86 C HETATM 5728 C6 PLP A 400 -29.903 11.836 -20.560 1.00 55.27 C
Example - Residue modification from PDB 1CLV:
In this example, a glutamine residue is modified to make pyroglutamate (5HP). Since this modification has fewer than 10 atoms, the entire residue including the modification is renamed with a new three-letter code. This code appears in the SEQRES records. The modification carries the same chain ID and residue number as the glutamine which it is modifying. The coordinate records for the entire residue and modification are inserted in the correct sequence position in the atom list for the polymer chain in which the modification resides.
PDB file snippet from 1CLV:
SEQRES 1 A 471 5HP LYS ASP ALA ASN PHE ALA SER GLY ARG ASN SER ILE MODRES 1CLV 5HP A 1 GLU PYROGLUTAMIC ACID HET 5HP A 1 8 HETNAM 5HP PYROGLUTAMIC ACID FORMUL 1 5HP C5 H7 N1 O3 HETATM 1 N 5HP A 1 29.020 7.713 8.323 1.00 17.69 N HETATM 2 CA 5HP A 1 30.380 8.263 8.128 1.00 16.55 C HETATM 3 C 5HP A 1 30.667 8.643 6.676 1.00 13.70 C HETATM 4 O 5HP A 1 31.514 9.493 6.417 1.00 14.12 O HETATM 5 CB 5HP A 1 31.390 7.193 8.612 1.00 16.19 C HETATM 6 CG 5HP A 1 30.495 5.943 8.987 1.00 16.93 C HETATM 7 CD 5HP A 1 29.101 6.476 8.787 1.00 19.39 C HETATM 8 OD 5HP A 1 28.089 5.796 9.037 1.00 22.92 O ATOM 9 N LYS A 2 29.983 7.994 5.735 1.00 14.51 N ATOM 10 CA LYS A 2 30.178 8.269 4.313 1.00 13.28 C ATOM 11 C LYS A 2 28.999 8.963 3.640 1.00 16.12 C ATOM 12 O LYS A 2 29.027 9.224 2.435 1.00 17.54 O ATOM 13 CB LYS A 2 30.534 6.982 3.574 1.00 13.33 C ATOM 14 CG LYS A 2 31.829 6.365 4.059 1.00 14.70 C ATOM 15 CD LYS A 2 32.140 5.082 3.331 1.00 17.22 C ATOM 16 CE LYS A 2 33.340 4.422 3.957 1.00 17.71 C ATOM 17 NZ LYS A 2 33.629 3.104 3.340 1.00 20.50 N
Example - Metal coordination from PDB 6HBW
In this example, the iron of a hemoglobin (residue number 153) is coordinated to a single chain (chain A) and as a result the heme is given chain ID A. The coordinate records for the HEM group are listed following the TER card ending the last polymer chain. The hemoglobin molecule is assigned a unique residue number within the chain.
PDB file snippet from 6HBW:
ATOM 4387 N HIS D 146 29.948 11.544 57.310 1.00 15.88 N ATOM 4388 CA HIS D 146 31.400 11.604 57.355 1.00 12.40 C ATOM 4389 C HIS D 146 31.887 10.389 58.126 1.00 12.15 C ATOM 4390 O HIS D 146 31.027 9.700 58.721 1.00 12.36 O ATOM 4391 CB HIS D 146 31.831 12.883 58.089 1.00 11.82 C ATOM 4392 CG HIS D 146 31.346 12.951 59.508 1.00 12.27 C ATOM 4393 ND1 HIS D 146 32.108 12.536 60.579 1.00 13.29 N ATOM 4394 CD2 HIS D 146 30.158 13.356 60.024 1.00 10.91 C ATOM 4395 CE1 HIS D 146 31.416 12.684 61.698 1.00 10.62 C ATOM 4396 NE2 HIS D 146 30.232 13.180 61.387 1.00 13.85 N ATOM 4397 OXT HIS D 146 33.113 10.165 58.158 1.00 10.96 O TER 4398 HIS D 146 HETATM 4399 FE HEM A 153 29.582 -8.922 58.222 1.00 10.43 FE HETATM 4400 CHA HEM A 153 29.810 -9.479 61.617 1.00 9.96 C HETATM 4401 CHB HEM A 153 31.525 -11.751 57.700 1.00 10.61 C HETATM 4402 CHC HEM A 153 29.970 -8.108 54.946 1.00 4.45 C HETATM 4403 CHD HEM A 153 28.773 -5.622 58.888 1.00 2.72 C HETATM 4404 N A HEM A 153 30.321 -10.363 59.385 1.00 9.59 N
Example - Coordination to multiple chains from PDB 1BV7
In this example, the drug (residue name XV6) has non-covalent interactions with both chains A and B. As a result, it is assigned a residue number but no chain ID.
PDB file snippet from 1BV7:
ATOM 1516 CG PHE B 99 -12.923 35.142 33.334 1.00 27.60 C ATOM 1517 CD1 PHE B 99 -12.552 34.725 32.037 1.00 28.26 C ATOM 1518 CD2 PHE B 99 -11.933 35.632 34.219 1.00 30.81 C ATOM 1519 CE1 PHE B 99 -11.200 34.802 31.628 1.00 30.04 C ATOM 1520 CE2 PHE B 99 -10.575 35.711 33.815 1.00 28.55 C ATOM 1521 CZ PHE B 99 -10.216 35.292 32.517 1.00 30.04 C TER 1522 PHE B 99 HETATM 1523 O1 XV6 638 -8.243 14.227 27.865 1.00 16.36 O HETATM 1524 O4 XV6 638 -11.697 18.691 28.877 1.00 13.52 O HETATM 1525 O5 XV6 638 -10.104 19.492 26.750 1.00 21.59 O HETATM 1526 N2 XV6 638 -9.653 15.574 28.900 1.00 14.71 N HETATM 1527 N7 XV6 638 -8.686 16.093 26.792 1.00 16.77 N HETATM 1528 C1 XV6 638 -8.859 15.282 27.852 1.00 16.75 C HETATM 1529 C2 XV6 638 -9.421 14.815 30.135 1.00 16.59 C
Q. How are coordinated solvent molecules distinguished from the other solvent molecules in the coordinate list, for example (Mg+6H2O)?
A. A magnesium ion coordinated with waters is treated differently from a non-coordinated magnesium ion. For example, the residue name for a magnesium ion coordinated with six waters is MO6. These waters are not further included in the list of solvent coordinates.
Example: From PDB 1D57:
In this example, the metal hydrate ligand interacts with two strands of DNA (chains A and B) and it is not assigned a chain ID.
PDB file snippet from 1D57:
ATOM 400 O6 G B 20 11.533 8.948 -9.338 1.00 10.00 ATOM 401 N1 G B 20 9.320 8.794 -9.059 1.00 13.14 ATOM 402 C2 G B 20 8.202 8.050 -8.814 1.00 19.82 ATOM 403 N2 G B 20 7.076 8.770 -8.777 1.00 20.67 ATOM 404 N3 G B 20 8.156 6.721 -8.622 1.00 14.41 ATOM 405 C4 G B 20 9.378 6.163 -8.693 1.00 14.13 TER 406 G B 20 HETATM 407 MG MO6 1 15.457 6.749 3.418 1.00 35.37 HETATM 408 OA MO6 1 14.443 7.465 1.820 1.00 15.39 HETATM 409 OB MO6 1 16.470 6.005 4.945 1.00 21.41 HETATM 410 OC MO6 1 15.236 4.898 2.627 1.00 10.50 HETATM 411 OD MO6 1 15.642 8.604 4.107 1.00 23.92 HETATM 412 OE MO6 1 13.754 6.563 4.444 1.00 18.05 HETATM 413 OF MO6 1 17.174 6.967 2.392 1.00 32.95
Q. What are the minimum requirements for polymers in PDB entries?
A. Polypeptide systems with a chain length of 3 or greater are treated as polymers in PDB entries. Smaller systems are treated as independent ligands if they contain more than 10 atoms or as modifications if they have fewer than 10 atoms. Polynucleotide or polysaccharide systems with chain lengths of 2 or greater are treated as polymers.
Each polymer chain is assigned a unique identifier (chain ID). For proteins and nucleic acids, SEQRES records for each chain are provided. Polysaccharides are not assigned SEQRES records or TER records.
Covalently bound polymeric ligands are assigned chain IDs. Connections between polymeric groups are identified in LINK records.
Example: Non-covalently bound polymeric ligands from PDB 1A1M:
In this example, a MHC class I molecule is complexed (not covalently bound) with a peptide from the gag protein of HIV2. Since the peptide has more than 3 amino acids it is assigned its own chain ID (chain C) as well as SEQRES records.
PDB file snippet from 1A1M:
SEQRES 21 A 278 VAL GLN HIS GLU GLY LEU PRO LYS PRO LEU THR LEU SEQRES 22 A 278 TRP GLU PRO HIS HIS SEQRES 1 B 99 ILE GLN ARG THR PRO LYS ILE GLN VAL TYR SER ARG SEQRES 2 B 99 PRO ALA GLU ASN GLY LYS SER ASN PHE LEU ASN CYS SEQRES 7 B 99 ALA CYS ARG VAL ASN HIS VAL THR LEU SER GLN PRO SEQRES 8 B 99 ILE VAL LYS TRP ASP ARG ASP MET SEQRES 1 C 9 THR PRO TYR ASP ILE ASN GLN MET LEU ATOM 3174 CB MET C 8 -8.690 29.342 19.095 1.00 38.72 ATOM 3175 CG MET C 8 -9.946 30.151 19.281 1.00 46.68 ATOM 3176 SD MET C 8 -10.527 30.652 17.646 1.00 62.25 ATOM 3177 CE MET C 8 -10.750 28.993 16.801 1.00 58.81 ATOM 3178 N LEU C 9 -8.919 29.066 22.526 1.00 18.36 ATOM 3179 CA LEU C 9 -9.595 28.398 23.619 1.00 15.34 ATOM 3180 C LEU C 9 -11.026 28.004 23.281 1.00 17.65 ATOM 3181 O LEU C 9 -11.535 28.452 22.235 1.00 19.30 ATOM 3182 CB LEU C 9 -9.529 29.270 24.883 1.00 8.08 ATOM 3183 CG LEU C 9 -8.416 28.899 25.866 1.00 12.64 ATOM 3184 CD1 LEU C 9 -7.078 28.818 25.136 1.00 9.87 ATOM 3185 CD2 LEU C 9 -8.350 29.852 27.061 1.00 12.13 ATOM 3186 OXT LEU C 9 -11.635 27.245 24.060 1.00 22.69 TER 3187 LEU C 9
Q. How are glycoproteins described?
A. Covalently bound sugars are to be handled as HET groups with LINK records to define the points of attachment. Individual sugars will have individual residue numbers. A polysaccharide with a chain length of 2 or greater is treated as a polymer, otherwise it is treated as a ligand or a modification. For covalently bound polysaccharide polymers, the entire attached polysaccharide will have a unique chain ID but will have no SEQRES records. The residue(s) to which the polysaccharide is attached are assigned standard residue names. The modification is further described in MODRES/HET/HETNAM/FORMUL and CONNECT records.
Example: Covalently bound polymeric sugar from PDB 1EBV:
In this example the protein is glycosylated at ASN 144. Since this polysaccharide in this example has two or more sugars, it is considered a polymer chain. The ASN retains its original name and the modification is described with MODRES/HET/HETNAM/FORMUL/CONNECT records. The point of attachment is defined in LINK records. The attached sugar chain is assigned its own residue numbers and chain ID B, but is not assigned SEQRES records. The individual NAG groups retain their original names.
PDB file snippet from 1EBV:
MODRES 1EBV ASN A 144 ASN GLYCOSYLATION SITE HET NAG B 671 14 HET NAG B 672 14 HETNAM NAG N-ACETYL-D-GLUCOSAMINE HETSYN NAG NAG FORMUL 2 NAG 4(C8 H15 N1 O6) LINK C1 NAG B 672 O4 NAG B 671 LINK C1 NAG B 671 ND2 ASN A 144 ATOM 920 N ASN A 144 43.703 33.213 177.254 1.00 4.54 N ATOM 921 CA ASN A 144 42.866 34.312 176.796 1.00 3.72 C ATOM 922 C ASN A 144 41.849 34.690 177.853 1.00 2.92 C ATOM 923 O ASN A 144 40.825 34.032 177.991 1.00 3.13 O ATOM 924 CB ASN A 144 42.144 33.903 175.522 1.00 3.53 C ATOM 925 CG ASN A 144 41.582 35.079 174.778 1.00 3.32 C ATOM 926 OD1 ASN A 144 41.113 36.032 175.383 1.00 2.96 O ATOM 927 ND2 ASN A 144 41.627 34.998 173.456 1.00 4.30 N TER 4482 PRO A 583 HETATM 4497 C1 NAG B 671 40.875 35.906 172.616 1.00 5.71 C HETATM 4498 C2 NAG B 671 41.877 36.712 171.783 1.00 6.56 C HETATM 4499 C3 NAG B 671 41.200 37.516 170.670 1.00 8.16 C HETATM 4500 C4 NAG B 671 40.242 36.632 169.860 1.00 9.68 C HETATM 4501 C5 NAG B 671 39.276 35.931 170.821 1.00 8.33 C HETATM 4502 C6 NAG B 671 38.372 34.966 170.106 1.00 9.02 C HETATM 4503 C7 NAG B 671 43.910 37.420 172.859 1.00 6.57 C HETATM 4504 C8 NAG B 671 44.748 38.672 173.041 1.00 6.37 C HETATM 4505 N2 NAG B 671 42.608 37.603 172.659 1.00 6.16 N HETATM 4506 O3 NAG B 671 42.193 38.056 169.806 1.00 6.72 O HETATM 4507 O4 NAG B 671 39.505 37.448 168.923 1.00 13.39 O HETATM 4508 O5 NAG B 671 40.013 35.139 171.768 1.00 7.61 O HETATM 4509 O6 NAG B 671 38.959 33.674 170.094 1.00 10.91 O HETATM 4510 O7 NAG B 671 44.448 36.305 172.890 1.00 6.40 O HETATM 4511 C1 NAG B 672 39.400 37.017 167.608 1.00 16.40 C HETATM 4512 C2 NAG B 672 38.396 37.902 166.873 1.00 18.68 C HETATM 4513 C3 NAG B 672 38.341 37.541 165.382 1.00 19.71 C HETATM 4514 C4 NAG B 672 39.728 37.444 164.747 1.00 19.18 C HETATM 4515 C5 NAG B 672 40.705 36.655 165.636 1.00 18.59 C HETATM 4516 C6 NAG B 672 42.132 36.811 165.158 1.00 19.23 C HETATM 4517 C7 NAG B 672 36.708 38.503 168.493 1.00 19.91 C HETATM 4518 C8 NAG B 672 35.590 37.969 169.374 1.00 18.85 C HETATM 4519 N2 NAG B 672 37.078 37.747 167.462 1.00 19.61 N HETATM 4520 O3 NAG B 672 37.598 38.531 164.688 1.00 21.61 O HETATM 4521 O4 NAG B 672 39.610 36.811 163.473 1.00 18.40 O HETATM 4522 O5 NAG B 672 40.682 37.149 166.993 1.00 17.38 O HETATM 4523 O6 NAG B 672 42.788 37.864 165.851 1.00 18.94 O HETATM 4524 O7 NAG B 672 37.230 39.591 168.751 1.00 20.57 O
Q. What is done if only a portion of a bound ligand is included in the coordinates because of crystallographic disorder?
A. If the chemistry of the ligand is known, the ligand is treated normally even if there are missing atoms.
Q. How are chimeras described?
A. Chimeric molecules are described as a single chain with a continuous sequence. Residue numbering proceeds throughout the entire chimera.
Example from 1TOL:
In this example the fusion protein comprises residues 1-86 of mature minor coat protein from gene III, including glycine-rich linker (GGGSEGGGSEGGGSEGGG), residues 295-421 of protein-TOLA, and the C-terminal tail with sequence (AAAHHHHHH).
PDB file snippet from 1TOL:
SEQRES 1 A 222 ALA GLU THR VAL GLU SER CYS LEU ALA LYS SER HIS THR SEQRES 2 A 222 GLU ASN SER PHE THR ASN VAL TRP LYS ASP ASP LYS THR SEQRES 3 A 222 LEU ASP ARG TYR ALA ASN TYR GLU GLY CYS LEU TRP ASN SEQRES 4 A 222 ALA THR GLY VAL VAL VAL CYS THR GLY ASP GLU THR GLN SEQRES 5 A 222 CYS TYR GLY THR TRP VAL PRO ILE GLY LEU ALA ILE PRO SEQRES 6 A 222 GLU ASN GLU GLY GLY GLY SER GLU GLY GLY GLY SER GLU SEQRES 7 A 222 GLY GLY GLY SER GLU GLY GLY GLY ASP ASP ILE PHE GLY SEQRES 8 A 222 GLU LEU SER SER GLY LYS ASN ALA PRO LYS THR GLY GLY SEQRES 9 A 222 GLY ALA LYS GLY ASN ASN ALA SER PRO ALA GLY SER GLY SEQRES 10 A 222 ASN THR LYS ASN ASN GLY ALA SER GLY ALA ASP ILE ASN SEQRES 11 A 222 ASN TYR ALA GLY GLN ILE LYS SER ALA ILE GLU SER LYS SEQRES 12 A 222 PHE TYR ASP ALA SER SER TYR ALA GLY LYS THR CYS THR SEQRES 13 A 222 LEU ARG ILE LYS LEU ALA PRO ASP GLY MET LEU LEU ASP SEQRES 14 A 222 ILE LYS PRO GLU GLY GLY ASP PRO ALA LEU CYS GLN ALA SEQRES 15 A 222 ALA LEU ALA ALA ALA LYS LEU ALA LYS ILE PRO LYS PRO SEQRES 16 A 222 PRO SER GLN ALA VAL TYR GLU VAL PHE LYS ASN ALA PRO SEQRES 17 A 222 LEU ASP PHE LYS PRO ALA ALA ALA HIS HIS HIS HIS HIS SEQRES 18 A 222 HIS
Q. How are TER records used?
A. TER cards are used to unambiguously mark the ends of polymer chains, except for polysaccharides.
Q. How are models in NMR ensembles organized in PDB entries?
A. In older NMR files the atom serial numbers were consecutive across all models in the entry. Owing to limitations in the field width for the atom serial number, this practice resulted in some ensembles being divided among multiple PDB entries.
With entries processed since February 1999, the atom serial number is reset to 1 at the beginning of each model thereby allowing the full ensemble to be included within a single PDB entry.
© RCSB