Overview
The SOURCE record specifies the biological and/or chemical source of each biological molecule in the entry. Sources are described by both the common name and the scientific name, e.g., genus and species. Strain and/or cell-line for immortalized cells are given when they help to uniquely identify the biological entity studied.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION ---------------------------------------------------------------------------------- 1 - 6 Record name "SOURCE" 9 - 10 Continuation continuation Allows concatenation of multiple records. 11 - 70 Specification srcName Identifies the source of the list macromolecule in a token: value format.
Details
TOKEN VALUE DEFINITION --------------------------------------------------------------------------------- MOL_ID Numbers each molecule. Same as appears in COMPND. SYNTHETIC Indicates a chemically-synthesized source. FRAGMENT A domain or fragment of the molecule may be specified. ORGANISM_SCIENTIFIC Scientific name of the organism. ORGANISM_COMMON Common name of the organism. STRAIN Identifies the strain. VARIANT Identifies the variant. CELL_LINE The specific line of cells used in the experiment. ATCC American Type Culture Collection tissue culture number. ORGAN Organized group of tissues that carries on a specialized function. TISSUE Organized group of cells with a common function and structure. CELL Identifies the particular cell type. ORGANELLE Organized structure within a cell. SECRETION Identifies the secretion, such as saliva, urine, or venom, from which the molecule was isolated. CELLULAR_LOCATION Identifies the location inside (or outside) the cell. PLASMID Identifies the plasmid containing the gene. GENE Identifies the gene. EXPRESSION_SYSTEM System used to express recombinant macromolecules. EXPRESSION_SYSTEM_STRAIN Strain of the organism in which the molecule was expressed. EXPRESSION_SYSTEM_VARIANT Variant of the organism used as the expression system. EXPRESSION_SYSTEM_CELL_LINE The specific line of cells used as the expression system. EXPRESSION_SYSTEM_ATCC_NUMBER Identifies the ATCC number of the expression system EXPRESSION_SYSTEM_ORGAN Specific organ which expressed the molecule. EXPRESSION_SYSTEM_TISSUE Specific tissue which expressed the molecule. EXPRESSION_SYSTEM_CELL Specific cell type which expressed the molecule. EXPRESSION_SYSTEM_ORGANELLE Specific organelle which expressed the molecule. EXPRESSION_SYSTEM_CELLULAR_LOCATION Identifies the location inside or outside the cell which expressed the molecule. EXPRESSION_SYSTEM_VECTOR_TYPE Identifies the type of vector used, i.e., plasmid, virus, or cosmid. EXPRESSION_SYSTEM_VECTOR Identifies the vector used. EXPRESSION_SYSTEM_PLASMID Plasmid used in the recombinant experiment. EXPRESSION_SYSTEM_GENE Name of the gene used in recombinant experiment. OTHER_DETAILS Used to present information on the source which is not given elsewhere.
* The srcName is a list of token: value pairs describing each biological component of the entry.
* As in COMPND, the order is not specified except that MOL_ID or FRAGMENT indicates subsequent specifications are related to that molecule or fragment of the molecule.
* Physical layout of these items may be altered by PDB staff to improve human readability of the SOURCE record.
* Only the relevant tokens need to appear in an entry.
* Molecules prepared by purely chemical synthetic methods are described by the specification SYNTHETIC followed by "YES" or an optional value, such as NON-BIOLOGICAL SOURCE or BASED ON THE NATURAL SEQUENCE. ENGINEERED must appear in the COMPND record.
* In the case of a chemically synthesized molecule using a biologically functional sequence (nucleic or amino acid), SOURCE reflects the biological origin of the sequence and COMPND reflects its synthetic nature by inclusion of the token ENGINEERED. The token SYNTHETIC appears in SOURCE.
* If made from a synthetic gene, ENGINEERED appears in COMPND and the expression system is described in SOURCE (SYNTHETIC does NOT appear in SOURCE).
* If the molecule was made using recombinant techniques, ENGINEERED appears in COMPND and the system is described in SOURCE.
* When multiple macromolecules appear in the entry, each MOL_ID, as given in the COMPND record, must be repeated in the SOURCE record along with the source information for the corresponding molecule.
* Hybrid molecules prepared by fusion of genes are treated as multi-molecular systems for the purpose of specifying the source. The token FRAGMENT is used to associate the source with its corresponding fragment.
- When necessary to fully describe hybrid molecules, tokens may appear more than once for a given MOL_ID.
- All relevant token: value pairs that taken together fully describe each fragment are grouped following the appropriate FRAGMENT.
- Descriptors relative to the full system appear before the FRAGMENT (see Example 3 below).
* ORGANISM_SCIENTIFIC provides the Latin genus and species. Virus names are listed as the scientific name.
* Cellular origin is described by giving cellular compartment, organelle, cell, tissue, organ, or body part from which the molecule was isolated.
* CELLULAR_LOCATION may be used to indicate where in the organism the compound was found. Examples are: extracellular, periplasmic, cytosol.
* Entries containing molecules prepared by recombinant techniques are described as follows:
- The expression system is described.
- The organism and cell location given are for the source of the gene used in the cloning experiment.
- Transgenic organisms, such as mouse producing human proteins, are treated as expression systems.
* For a theoretical modelling experiment, SOURCE describes the modelled compound just as though it were an experimental study.
* New tokens may be added by the PDB.
Verification/Validation/Value Authority Control
The biological source is compared to that found in the sequence database. Common and scientific names are checked against the "Annotated Classification of Source Organisms: PIR-International Protein Sequence Database" compiled by Andrzej Elzanowski and available from the PDB.
Relationships to Other Record Types
Each macromolecule listed in COMPND must have a corresponding source.
Example
1 2 3 4 5 6 7 1234567890123456789012345678901234567890123456789012345678901234567890 SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: AVIAN SARCOMA VIRUS; SOURCE 3 STRAIN: SCHMIDT-RUPPIN B; SOURCE 4 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 5 EXPRESSION_SYSTEM_PLASMID: PRC23IN SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: GALLUS GALLUS; SOURCE 3 ORGANISM_COMMON: CHICKEN; SOURCE 4 ORGAN: HEART; SOURCE 5 TISSUE: MUSCLE SOURCE MOL_ID: 1; SOURCE 2 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 3 EXPRESSION_SYSTEM_STRAIN: BE167; SOURCE 4 FRAGMENT: RESIDUES 1-16; SOURCE 5 ORGANISM_SCIENTIFIC: BACILLUS AMYLOLIQUEFACIENS; SOURCE 6 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 7 FRAGMENT: RESIDUES 17-214; SOURCE 8 ORGANISM_SCIENTIFIC: BACILLUS MACERANS