PDB Format Guide, Part 20

COMPND

Overview

The COMPND record describes the macromolecular contents of an entry. Each macromolecule found in the entry is described by a set of token: value pairs, and is referred to as a COMPND record component. Since the concept of a molecule is difficult to specify exactly, PDB staff may exercise editorial judgment in consultation with depositors in assigning these names.

For each macromolecular component, the molecule name, synonyms, number assigned by the Enzyme Commission (EC), and other relevant details are specified.

Record Format

COLUMNS        DATA TYPE         FIELD          DEFINITION
----------------------------------------------------------------------------------
 1 -  6        Record name       "COMPND"

 9 - 10        Continuation      continuation   Allows concatenation of multiple
                                                records.

11 - 70        Specification     compound       Description of the molecular
               list                             components.

Details

* The compound record is a Specification list. The specifications, or tokens, that may be used are listed below:

TOKEN                   VALUE DEFINITION
---------------------------------------------------------------------------------
MOL_ID                  Numbers each component; also used in SOURCE to associate
                        the information.

MOLECULE                Name of the macromolecule.

CHAIN                   Comma-separated list of chain identifier(s). "NULL" is
                        used to indicate a blank chain identifier.

FRAGMENT                Specifies a domain or region of the molecule.

SYNONYM                 Comma-separated list of synonyms for the MOLECULE.

EC                      The Enzyme Commission number associated with the
                        molecule. If there is more than one EC number, they
                        are presented as a comma-separated list.

ENGINEERED              Indicates that the molecule was produced using
                        recombinant technology or by purely chemical synthesis.

MUTATION                Describes mutations from the wild type molecule.

BIOLOGICAL_UNIT         If the MOLECULE functions as part of a larger
                        biological unit, the entire functional unit may be
                        described.

OTHER_DETAILS           Additional comments.

* In the general case the PDB tends to reflect the biological/functional view of the molecule. For example, the hetero-tetramer hemoglobin molecule is treated as a discrete component in COMPND.

* In the case of synthetic molecules, e. g., hybrids, the description will be provided by the depositor.

* No specific rules apply to the ordering of the tokens, except that the occurrence of MOL_ID or FRAGMENT indicates that the subsequent tokens are related to that specific molecule or fragment of the molecule.

* Physical layout of these items may be altered by PDB staff to improve human readability of the COMPND record.

* Asterisks in nucleic acid names (in MOLECULE) are for ease of reading.

* When insertion codes are given as part of the residue name, they must be given within square brackets, i.e., H57[A]N. This might occur when listing residues in FRAGMENT, MUTATION, or OTHER_DETAILS.

* For multi-chain molecules, e.g., the hemoglobin tetramer, a comma-separated list of CHAIN identifiers is used.

* When non-blank chain identifiers occur in the entry, they must be specified.

* NULL is used to indicate blank chain identifiers. E.g., CHAIN: NULL, CHAIN: NULL, B, C.

* For enzymes, if no EC number has been assigned, "EC: NOT ASSIGNED" is used.

* ENGINEERED is followed either by "YES" or by a comment.

* For the token MUTATION, the following set of examples illustrate the conventions used by PDB to represent various types of mutations.

   MUTATION TYPE         DESCRIPTION                     FORM
   ------------------------------------------------------------------------------
   Simple substitution   His 57 replaced by Asn          H57N

                         His 57A replaced by Asn, in
                         chain C only                    Chain C, H57[A]N

   Insertion             His and Pro inserted before
                         Lys 48                          INS(HP-K48)

   Deletion              Arg 141 of chains A and C
                         deleted, not deleted in
                         chain B                         Chain A, C, DEL(R141)

                         His 23 through ARG 26 deleted   DEL(23-26)

                         His 23C and Arg 26 deleted
                         from chain B only               Chain B, DEL(H23[C],R26)

* When there are more than ten mutations:

- All the mutations are listed in the SEQADV record.

- Some mutations may be listed in MUTATION in COMPND to highlight the most important ones, at the depositor's discretion.

* New tokens may be added by the PDB as needed.

Verification/Validation/Value Authority Control

CHAIN must match the chain identifiers(s) of the molecule(s). EC numbers are checked against the Enzyme Data Bank.

Relationships to Other Record Types

Each molecule given a MOL_ID in COMPND must be listed and given the biological source information in SOURCE. In the case of mutations, the SEQADV records will present differences from the reference molecule. REMARK record may further describe the contents of the entry. Also see verification above.

Example

         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890
COMPND    MOL_ID: 1;
COMPND   2 MOLECULE: HEMOGLOBIN;
COMPND   3 CHAIN: A, B, C, D;
COMPND   4 ENGINEERED: YES;
COMPND   5 MUTATION: CHAIN B, D, V1A;
COMPND   6 BIOLOGICAL_UNIT: HEMOGLOBIN EXISTS AS AN A1B1/A2B2
COMPND   7 TETRAMER;
COMPND   8 OTHER_DETAILS: DEOXY FORM

COMPND    MOL_ID: 1;
COMPND   2 MOLECULE: COWPEA CHLOROTIC MOTTLE VIRUS;
COMPND   3 CHAIN: A, B, C;
COMPND   4 SYNONYM: CCMV;
COMPND   5 MOL_ID: 2;
COMPND   6 MOLECULE: RNA (5'-(*AP*UP*AP*U)-3');
COMPND   7 CHAIN: D, F;
COMPND   8 ENGINEERED: YES;
COMPND   9 MOL_ID: 3;
COMPND  10 MOLECULE: RNA (5'-(*AP*U)-3');
COMPND  11 CHAIN: E;
COMPND  12 ENGINEERED: YES

COMPND    MOL_ID: 1;
COMPND   2 MOLECULE: HEVAMINE A;
COMPND   3 CHAIN: NULL;
COMPND   4 EC: 3.2.1.14, 3.2.1.17;
COMPND   5 OTHER_DETAILS: PLANT ENDOCHITINASE/LYSOZYME