Each record type is presented in a table which contains the division of the records into fields by column number, defined data type, field name or a quoted string which must appear in the field, and field definition. Any column not specified must be left blank.
Each field contains an identified data type which can be validated by a program. These are:
DATA TYPE DESCRIPTION ---------------------------------------------------------------------------------- AChar An alphabetic character (A-Z, a-z). Atom Atom name which follow the naming rules in Appendix 3. Character Any non-control character in the ASCII character set or a space. Continuation A two-character field that is either blank (for the first record of a set) or contains a two digit number right-justified and blank-filled which counts continuation records starting with 2. The continuation number must be followed by a blank. Date A 9 character string in the form dd-mmm-yy where DD is the day of the month, zero-filled on the left (e.g., 04); MMM is the common English 3-letter abbreviation of the month; and YY is a year in the 20th century. This must represent a valid date. IDcode A PDB identification code which consists of 4 characters, the first of which is a digit in the range 0 - 9; the remaining 3 are alpha-numeric, and letters are upper case only. Entries with a 0 as the first character do not contain coordinate data. Integer Right-justified blank-filled integer value. Token A sequence of non-space characters followed by a colon and a space. List A String that is composed of text separated with commas. LString A literal string of characters. All spacing is significant and must be preserved. LString(n) An LString with exactly n characters. Real(n,m) Real (floating point) number in the FORTRAN format Fn.m. Record name The name of the record: 6 characters, left-justified and blank-filled. Residue name One of the standard amino acid or nucleic acids, as listed below, or the non-standard group designation as defined in the HET dictionary. Field is right-justified. SList A String that is composed of text separated with semi-colons. Specification A String composed of a token and its associated value separated by a colon. Specification A sequence of Specifications, separated by semi-colons. list String A sequence of characters. These characters may have arbitrary spacing, but should be interpreted as directed below. String(n) A String with exactly n characters. SymOP An integer field of from 4 to 6 digits, right-justified, of the form nnnMMM where nnn is the symmetry operator number and MMM is the translation vector. See details in Appendix 1.
To interpret a String, concatenate the contents of all continued fields together, collapse all sequences of multiple blanks to a single blank, and remove any leading and trailing blanks. This permits very long strings to be properly reconstructed.
The above information about field formats is repeated as Appendix 6.
Residue Names
Standard residue names used in PDB entries:
RESIDUE TYPE RESIDUE NAME ---------------------------------------------------------------------------------- Amino acids ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL, ASX, GLX Nucleic acids A, C, G, T, U, I, +A, +C, +G, +T, +U, +I Other UNK (unknown)
See Appendix 4 for more information on the standard residue names and abbreviations, and Appendix 5 for their chemical formulas and molecular weights.