SOURCE

Overview

The SOURCE record specifies the biological and/or chemical source of each biological molecule in the entry. Sources are described by both the common name and the scientific name, e.g., genus and species. Strain and/or cell-line for immortalized cells are given when they help to uniquely identify the biological entity studied.

Record Format

COLUMNS        DATA TYPE         FIELD          DEFINITION
----------------------------------------------------------------------------------
 1 -  6        Record name       "SOURCE"

 9 - 10        Continuation      continuation   Allows concatenation of multiple
                                                records.

11 - 70        Specification     srcName        Identifies the source of the
               list                             macromolecule in a token: value
                                                format.

Details

TOKEN                                VALUE DEFINITION
---------------------------------------------------------------------------------
MOL_ID                               Numbers each molecule.  Same as appears in
                                     COMPND.

SYNTHETIC                            Indicates a chemically-synthesized source.

FRAGMENT                             A domain or fragment of the molecule may be
                                     specified.

ORGANISM_SCIENTIFIC                  Scientific name of the organism.

ORGANISM_COMMON                      Common name of the organism.

STRAIN                               Identifies the strain.

VARIANT                              Identifies the variant.

CELL_LINE                            The specific line of cells used in the
                                     experiment.

ATCC                                 American Type Culture Collection tissue
                                     culture number.

ORGAN                                Organized group of tissues that carries on
                                     a specialized function.

TISSUE                               Organized group of cells with a common
                                     function and structure.

CELL                                 Identifies the particular cell type.

ORGANELLE                            Organized structure within a cell.

SECRETION                            Identifies the secretion, such as saliva,
                                     urine, or venom, from which the molecule was
                                     isolated.

CELLULAR_LOCATION                    Identifies the location inside (or
                                     outside) the cell.

PLASMID                              Identifies the plasmid containing the gene.

GENE                                 Identifies the gene.

EXPRESSION_SYSTEM                    System used to express recombinant
                                     macromolecules.

EXPRESSION_SYSTEM_STRAIN             Strain of the organism in which the molecule
                                     was expressed.

EXPRESSION_SYSTEM_VARIANT            Variant of the organism used as the
                                     expression system.

EXPRESSION_SYSTEM_CELL_LINE          The specific line of cells used as the
                                     expression system.

EXPRESSION_SYSTEM_ATCC_NUMBER        Identifies the ATCC number of the expression
                                     system

EXPRESSION_SYSTEM_ORGAN              Specific organ which expressed the molecule.

EXPRESSION_SYSTEM_TISSUE             Specific tissue which expressed the molecule.

EXPRESSION_SYSTEM_CELL               Specific cell type which expressed the
                                     molecule.

EXPRESSION_SYSTEM_ORGANELLE          Specific organelle which expressed the
                                     molecule.

EXPRESSION_SYSTEM_CELLULAR_LOCATION  Identifies the location inside or outside
                                     the cell which expressed the molecule.

EXPRESSION_SYSTEM_VECTOR_TYPE        Identifies the type of vector used, i.e.,
                                     plasmid, virus, or cosmid.

EXPRESSION_SYSTEM_VECTOR             Identifies the vector used.

EXPRESSION_SYSTEM_PLASMID            Plasmid used in the recombinant experiment.

EXPRESSION_SYSTEM_GENE               Name of the gene used in recombinant
                                     experiment.

OTHER_DETAILS                        Used to present information on the source
                                     which is not given elsewhere.

* The srcName is a list of token: value pairs describing each biological component of the entry.

* As in COMPND, the order is not specified except that MOL_ID or FRAGMENT indicates subsequent specifications are related to that molecule or fragment of the molecule.

* Physical layout of these items may be altered by PDB staff to improve human readability of the SOURCE record.

* Only the relevant tokens need to appear in an entry.

* Molecules prepared by purely chemical synthetic methods are described by the specification SYNTHETIC followed by "YES" or an optional value, such as NON-BIOLOGICAL SOURCE or BASED ON THE NATURAL SEQUENCE. ENGINEERED must appear in the COMPND record.

* In the case of a chemically synthesized molecule using a biologically functional sequence (nucleic or amino acid), SOURCE reflects the biological origin of the sequence and COMPND reflects its synthetic nature by inclusion of the token ENGINEERED. The token SYNTHETIC appears in SOURCE.

* If made from a synthetic gene, ENGINEERED appears in COMPND and the expression system is described in SOURCE (SYNTHETIC does NOT appear in SOURCE).

* If the molecule was made using recombinant techniques, ENGINEERED appears in COMPND and the system is described in SOURCE.

* When multiple macromolecules appear in the entry, each MOL_ID, as given in the COMPND record, must be repeated in the SOURCE record along with the source information for the corresponding molecule.

* Hybrid molecules prepared by fusion of genes are treated as multi-molecular systems for the purpose of specifying the source. The token FRAGMENT is used to associate the source with its corresponding fragment.

- When necessary to fully describe hybrid molecules, tokens may appear more than once for a given MOL_ID.
- All relevant token: value pairs that taken together fully describe each fragment are grouped following the appropriate FRAGMENT.
- Descriptors relative to the full system appear before the FRAGMENT (see Example 3 below).

* ORGANISM_SCIENTIFIC provides the Latin genus and species. Virus names are listed as the scientific name.

* Cellular origin is described by giving cellular compartment, organelle, cell, tissue, organ, or body part from which the molecule was isolated.

* CELLULAR_LOCATION may be used to indicate where in the organism the compound was found. Examples are: extracellular, periplasmic, cytosol.

* Entries containing molecules prepared by recombinant techniques are described as follows:

- The expression system is described.
- The organism and cell location given are for the source of the gene used in the cloning experiment.
- Transgenic organisms, such as mouse producing human proteins, are treated as expression systems.

* For a theoretical modelling experiment, SOURCE describes the modelled compound just as though it were an experimental study.

* New tokens may be added by the PDB.

Verification/Validation/Value Authority Control

The biological source is compared to that found in the sequence database. Common and scientific names are checked against the "Annotated Classification of Source Organisms: PIR-International Protein Sequence Database" compiled by Andrzej Elzanowski and available from the PDB.

Relationships to Other Record Types

Each macromolecule listed in COMPND must have a corresponding source.

Example

         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890
SOURCE    MOL_ID: 1;
SOURCE   2 ORGANISM_SCIENTIFIC: AVIAN SARCOMA VIRUS;
SOURCE   3 STRAIN: SCHMIDT-RUPPIN B;
SOURCE   4 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
SOURCE   5 EXPRESSION_SYSTEM_PLASMID: PRC23IN

SOURCE    MOL_ID: 1;
SOURCE   2 ORGANISM_SCIENTIFIC: GALLUS GALLUS;
SOURCE   3 ORGANISM_COMMON: CHICKEN;
SOURCE   4 ORGAN: HEART;
SOURCE   5 TISSUE: MUSCLE

SOURCE    MOL_ID: 1;
SOURCE   2 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
SOURCE   3 EXPRESSION_SYSTEM_STRAIN: BE167;
SOURCE   4 FRAGMENT: RESIDUES 1-16;
SOURCE   5 ORGANISM_SCIENTIFIC: BACILLUS AMYLOLIQUEFACIENS;
SOURCE   6 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
SOURCE   7 FRAGMENT: RESIDUES 17-214;
SOURCE   8 ORGANISM_SCIENTIFIC: BACILLUS MACERANS