Overview
The COMPND record describes the macromolecular contents of an entry. Each macromolecule found in the entry is described by a set of token: value pairs, and is referred to as a COMPND record component. Since the concept of a molecule is difficult to specify exactly, PDB staff may exercise editorial judgment in consultation with depositors in assigning these names.
For each macromolecular component, the molecule name, synonyms, number assigned by the Enzyme Commission (EC), and other relevant details are specified.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION ---------------------------------------------------------------------------------- 1 - 6 Record name "COMPND" 9 - 10 Continuation continuation Allows concatenation of multiple records. 11 - 70 Specification compound Description of the molecular list components.
Details
* The compound record is a Specification list. The specifications, or tokens, that may be used are listed below:
TOKEN VALUE DEFINITION --------------------------------------------------------------------------------- MOL_ID Numbers each component; also used in SOURCE to associate the information. MOLECULE Name of the macromolecule. CHAIN Comma-separated list of chain identifier(s). "NULL" is used to indicate a blank chain identifier. FRAGMENT Specifies a domain or region of the molecule. SYNONYM Comma-separated list of synonyms for the MOLECULE. EC The Enzyme Commission number associated with the molecule. If there is more than one EC number, they are presented as a comma-separated list. ENGINEERED Indicates that the molecule was produced using recombinant technology or by purely chemical synthesis. MUTATION Describes mutations from the wild type molecule. BIOLOGICAL_UNIT If the MOLECULE functions as part of a larger biological unit, the entire functional unit may be described. OTHER_DETAILS Additional comments.
* In the general case the PDB tends to reflect the biological/functional view of the molecule. For example, the hetero-tetramer hemoglobin molecule is treated as a discrete component in COMPND.
* In the case of synthetic molecules, e. g., hybrids, the description will be provided by the depositor.
* No specific rules apply to the ordering of the tokens, except that the occurrence of MOL_ID or FRAGMENT indicates that the subsequent tokens are related to that specific molecule or fragment of the molecule.
* Physical layout of these items may be altered by PDB staff to improve human readability of the COMPND record.
* Asterisks in nucleic acid names (in MOLECULE) are for ease of reading.
* When insertion codes are given as part of the residue name, they must be given within square brackets, i.e., H57[A]N. This might occur when listing residues in FRAGMENT, MUTATION, or OTHER_DETAILS.
* For multi-chain molecules, e.g., the hemoglobin tetramer, a comma-separated list of CHAIN identifiers is used.
* When non-blank chain identifiers occur in the entry, they must be specified.
* NULL is used to indicate blank chain identifiers. E.g., CHAIN: NULL, CHAIN: NULL, B, C.
* For enzymes, if no EC number has been assigned, "EC: NOT ASSIGNED" is used.
* ENGINEERED is followed either by "YES" or by a comment.
* For the token MUTATION, the following set of examples illustrate the conventions used by PDB to represent various types of mutations.
MUTATION TYPE DESCRIPTION FORM ------------------------------------------------------------------------------ Simple substitution His 57 replaced by Asn H57N His 57A replaced by Asn, in chain C only Chain C, H57[A]N Insertion His and Pro inserted before Lys 48 INS(HP-K48) Deletion Arg 141 of chains A and C deleted, not deleted in chain B Chain A, C, DEL(R141) His 23 through ARG 26 deleted DEL(23-26) His 23C and Arg 26 deleted from chain B only Chain B, DEL(H23[C],R26)
* When there are more than ten mutations:
- All the mutations are listed in the SEQADV record.
- Some mutations may be listed in MUTATION in COMPND to highlight the most important ones, at the depositor's discretion.
* New tokens may be added by the PDB as needed.
Verification/Validation/Value Authority Control
CHAIN must match the chain identifiers(s) of the molecule(s). EC numbers are checked against the Enzyme Data Bank.
Relationships to Other Record Types
Each molecule given a MOL_ID in COMPND must be listed and given the biological source information in SOURCE. In the case of mutations, the SEQADV records will present differences from the reference molecule. REMARK record may further describe the contents of the entry. Also see verification above.
Example
1 2 3 4 5 6 7 1234567890123456789012345678901234567890123456789012345678901234567890 COMPND MOL_ID: 1; COMPND 2 MOLECULE: HEMOGLOBIN; COMPND 3 CHAIN: A, B, C, D; COMPND 4 ENGINEERED: YES; COMPND 5 MUTATION: CHAIN B, D, V1A; COMPND 6 BIOLOGICAL_UNIT: HEMOGLOBIN EXISTS AS AN A1B1/A2B2 COMPND 7 TETRAMER; COMPND 8 OTHER_DETAILS: DEOXY FORM COMPND MOL_ID: 1; COMPND 2 MOLECULE: COWPEA CHLOROTIC MOTTLE VIRUS; COMPND 3 CHAIN: A, B, C; COMPND 4 SYNONYM: CCMV; COMPND 5 MOL_ID: 2; COMPND 6 MOLECULE: RNA (5'-(*AP*UP*AP*U)-3'); COMPND 7 CHAIN: D, F; COMPND 8 ENGINEERED: YES; COMPND 9 MOL_ID: 3; COMPND 10 MOLECULE: RNA (5'-(*AP*U)-3'); COMPND 11 CHAIN: E; COMPND 12 ENGINEERED: YES COMPND MOL_ID: 1; COMPND 2 MOLECULE: HEVAMINE A; COMPND 3 CHAIN: NULL; COMPND 4 EC: 3.2.1.14, 3.2.1.17; COMPND 5 OTHER_DETAILS: PLANT ENDOCHITINASE/LYSOZYME