PDB Format Guide, Part 14

Field Formats

Each record type is presented in a table which contains the division of the records into fields by column number, defined data type, field name or a quoted string which must appear in the field, and field definition. Any column not specified must be left blank.

Each field contains an identified data type which can be validated by a program. These are:

DATA TYPE          DESCRIPTION
----------------------------------------------------------------------------------
AChar              An alphabetic character (A-Z, a-z).

Atom               Atom name which follow the naming rules in Appendix 3.

Character          Any non-control character in the ASCII character set or a
                   space.

Continuation       A two-character field that is either blank (for the first
                   record of a set) or contains a two digit number
                   right-justified and blank-filled which counts continuation
                   records starting with 2.  The continuation number must be
                   followed by a blank.

Date               A 9 character string in the form dd-mmm-yy where DD is the
                   day of the month, zero-filled on the left (e.g., 04); MMM is
                   the common English 3-letter abbreviation of the month; and
                   YY is a year in the 20th century.  This must represent a
                   valid date.

IDcode             A PDB identification code which consists of 4 characters,
                   the first of which is a digit in the range 0 - 9; the
                   remaining 3 are alpha-numeric, and letters are upper case
                   only.  Entries with a 0 as the first character do not
                   contain coordinate data.

Integer            Right-justified blank-filled integer value.

Token              A sequence of non-space characters followed by a colon and a
                   space.

List               A String that is composed of text separated with commas.

LString            A literal string of characters.  All spacing is significant
                   and must be preserved.

LString(n)         An LString with exactly n characters.

Real(n,m)          Real (floating point) number in the FORTRAN format Fn.m.

Record name        The name of the record: 6 characters, left-justified and
                   blank-filled.

Residue name       One of the standard amino acid or nucleic acids, as listed
                   below, or the non-standard group designation as defined in
                   the HET dictionary.  Field is right-justified.

SList              A String that is composed of text separated with semi-colons.

Specification      A String composed of a token and its associated value
                   separated by a colon.

Specification      A sequence of Specifications, separated by semi-colons.
  list

String             A sequence of characters.  These characters may have
                   arbitrary spacing, but should be interpreted as directed
                   below.

String(n)          A String with exactly n characters.

SymOP              An integer field of from 4 to 6 digits, right-justified, of
                   the form nnnMMM where nnn is the symmetry operator number and
                   MMM is the translation vector.  See details in Appendix 1.

To interpret a String, concatenate the contents of all continued fields together, collapse all sequences of multiple blanks to a single blank, and remove any leading and trailing blanks. This permits very long strings to be properly reconstructed.

The above information about field formats is repeated as Appendix 6.

Residue Names

Standard residue names used in PDB entries:

RESIDUE TYPE       RESIDUE NAME
----------------------------------------------------------------------------------
Amino acids        ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS,
                   MET, PHE, PRO, SER, THR, TRP, TYR, VAL, ASX, GLX

Nucleic acids      A, C, G, T, U, I, +A, +C, +G, +T, +U, +I

Other              UNK (unknown)

See Appendix 4 for more information on the standard residue names and abbreviations, and Appendix 5 for their chemical formulas and molecular weights.