GENETIC NOMENCLATURE FOR

Caenorhabditis elegans

 

Genetic nomenclature for Caenorhabditis elegans is supervised by the Caenorhabditis Genetics Center, as part of a contract from the US NIH National Center for Research Resources.   The curator for Genetic Mapping and Genetic Nomenclature  is:  Dr Jonathan Hodgkin  (Genetics Unit, Department of Biochemistry, University of Oxford, UK),   email: jah@bioch.ox.ac.uk

 

Investigators wishing to register new gene names for C. elegans should note the summary guidelines below.

 

The CGC also assigns specific identifying designations to each laboratory engaged in dedicated long-term genetic research on C. elegans.   Each such laboratory is assigned a lab/strain code, for naming strains, and an allele code, for naming mutations (see: http://biosci.umn.edu/CGC/Nomenclature/code.htm).  Investigators requiring new CGC designations should apply to jah@bioch.ox.ac.uk.

 

SUMMARY GUIDELINES FOR PROPOSING NEW GENE NAMES

 

1.    Gene names must conform to the standard format of 3 letters, hyphen, number.

 

2.    Genes can be named on the basis of a mutant phenotype or on the basis of the predicted protein product or RNA product.

 

3.    If a new gene clearly belongs in an existing gene class (of which more than 1000 now exist), then a new gene number will be assigned after consultation with the laboratory responsible for the gene class in question.  Gene classes and the corresponding assigning laboratories for each gene class are listed on WormBase <http://www.wormbase.org/>

and at the CGC <http://biosci.umn.edu/CGC/Nomenclature/genes.htm>.

 

4.    If the establishment of a new gene class name seems more appropriate, then approval for this name must be obtained from the CGC, preferably by e-mail application to the CGC Genetic Map and Nomenclature Curator  <jah@bioch.ox.ac.uk>

 

5.    Gene names based on homology with a previously named gene in another well-studied organism, such as Saccharomyces cerevisiae or Mus musculus, are often appropriate and desirable, especially where there is convincing orthology between genes.

 

6.    Gene names and gene numbering schemes that conform to established nomenclature proposals for particular protein classes are desirable.

 

7.    Gene names that are memorable, informative and simply explained are encouraged.

 

8.    Gene names based solely on RNAi phenotypes are discouraged.

 

9.    Gene names including c (for Caenorhabditis),  ce (for C. elegans), n (for  nematode) or w (for worm) are discouraged.  C. elegans as the organism of origin can be specified with a prefix (Ce-)  if desired.

 

10.  New gene name classes can be assigned in confidence, prior to formal publication or disclosure in an abstract.

 

 

STANDARD GENETIC NOMENCLATURE

RECOMMENDATIONS

 

This summary  is based on the original proposals for C. elegans nomenclature (Horvitz et al., 1979 Mol. Gen. Genet. 175: 129-133), plus additional recommendations that have been distributed in The Worm Breeder's Gazette.

 

 

Genetic loci

Genes are given names consisting of three italicized letters, a hyphen, and an italicized Arabic number, e.g., dpy-5 or let-37 or mlc-3.  The gene name may be followed by an italicized Roman numeral, to indicate the linkage group on which the gene maps, e.g., dpy-5 I or let-37 X or mlc-3 III.

 

For genes defined by mutation, the gene names refer to the mutant phenotype originally detected or most easily scored: dumpy (dumpy) in the case of dpy-5,

lethal (lethal) in the case of let-37.

 

For genes defined by cloning on the basis of sequence similarity, the gene name refers to the predicted protein product or RNA product:

myosin light chain in the case of mlc-3,

superoxide dismutase in the case of sod-1,

ribosomal RNA in the case of rrn-1.

 

Genes with related properties are usually given the same three letter name and different numbers. For example, there are three known myosin light chain genes: mlc-1, mlc-2, mlc-3, and more than twenty different dumpy genes: dpy-1, dpy-2, dpy-3, and so on.

 

Genes can be given names corresponding to homologous named genes in other standard genetic organisms. 

Examples:  rnt-1 is the C. elegans ortholog of the Drosophila gene runt.

wrn-1 is the C. elegans ortholog of the human gene WRN1, responsible for Werner’s syndrome.

 

Gene names that are memorable, informative and simply explained are encouraged.

 

Genes in a paralogous set related to a single named gene in another organism are sometimes given the same gene name and number, followed by a distinguishing decimal.   Example: four C. elegans genes homologous to SIR2 in S. cerevisiae have been given the names sir-2.1, sir-2.2, sir-2.3, sir-2.4.

 

Pseudogenes, for which there is good evidence that no functional product is ever generated, can be indicated by adding the optional italic suffix ps to the gene name, as in msp-48ps.  

 

Gene names based solely on RNAi phenotypes are discouraged.

 

Gene names including c (for Caenorhabditis),  ce (for C. elegans), n (for  nematode) or w (for worm) are discouraged.

 

 

Gene name conflicts

Gene names that have been established in the published literature and databases should preferably not be changed.

In cases where a gene has received multiple names, one name will be adopted as the main name for the gene.  Other names will continue to be listed in databases.  Whenever possible, name changes or the adoption of a single main name should be made with the approval of all laboratories concerned.

 

Homologous genes

If a homolog of a known C. elegans gene is identified in a related species such as Caenorhabditis briggsae, this can be given the same gene name, preceded by two italic letters referring to the species, and a hyphen. For example, Cb-tra-1 is the name for the C. briggsae homolog of the C. elegans gene tra-1.

The C. elegans homolog of a gene identified and named in another organism can be distinguished by the same convention, using "Ce-" as an optional prefix. For example, Ce-snt-1 defines the C. elegans synaptotagmin gene.

 

Alleles and mutations

Every mutation has a unique designation. Mutations are given names consisting of one or two italicized letters followed by an italicized Arabic number, e.g., e61 or mn138 or st5. The letter prefix refers to the laboratory of isolation, as registered with the Caenorhabditis Genetics Center. There are currently more than 350 registered laboratories. For example, e refers  (originally) to the MRC Laboratory of Molecular Biology (Cambridge, U.K.), (currently) to the laboratory of J. Hodgkin  (University of Oxford), and st refers to the laboratory of R.H. Waterston (Washington University, St. Louis, MO).

 

When gene and mutation names are used together, the mutation name is included in parentheses after the gene name, e.g., dpy-5(e61), let-37(mn138). When unambiguous (e.g., if only one mutation is known for a given gene or if all work on a gene described in a publication used a single mutation cited in a Methods section), gene names are used in preference to mutation names (let-37 rather than mn138 or let-37(mn138)).

 

Optional suffixes indicating characteristics of a mutation can follow a mutation name.  These are usually two-letter nonitalicized letters, e.g., hc17ts, where ts stands for temperature-sensitive, or pk15te, where te stands for transposon-excision.

 

Mutations created by in vitro mutagenesis should receive standard allele names. For cases where a pre-existing genomic mutation is re-created by in vitro mutagenesis, it is still desirable to give the new mutation a new name.

 

The wild-type allele of a gene is defined as that present in the Bristol N2 strain, stored frozen at the CGC and other locations. Wild-type alleles can be designated by a plus sign immediately after the gene name, dpy-5+, or, more commonly, by including the plus sign in parentheses, dpy-5(+).

 

Modifers:  suppressors, revertants and enhancers

There is no special nomenclature for modifier mutations. Many extragenic suppressor loci are called sup (40 loci defined so far, with a wide variety of properties and mechanisms). An increasing number of more specific modifier gene classes have been established, such as smu (suppressor of mec and unc), and smg (suppressor with morphogenetic effect on genitalia) and sel (suppressor/enhancer of lin-12).

 

Intragenic suppressors or modifiers are indicated by adding a second mutation name within parentheses; for example, unc-17(e245e2608) is an intragenic partial revertant of unc-17(e245).

 

Mutations known to be chromosomal rearrangements, rather than intragenic lesions, are named differently, as described below.

 

DNA sequences

There are no specific recommendations for designating cloned sequences that are not similar to known genes. Most genomic clones have been provided by the C. elegans mapping/sequencing consortium (based at the Sanger Centre, Cambridge, UK, and the Genome Sequencing Center, St. Louis, USA). Cosmid clones generated by the consortium are named on the basis of the vector, either pJB8 (initial letters B, C, D, E, R, M, ZC) or a Lorist vector (initial letters K, T, W, F, ZK). Phage clones (in Lambda 2001) are identified by the initial letters A, ZL, YSL.

 

YACs (yeast artificial chromosome clones) are identified by the initial letter Y, e.g., Y3D5.   YAC subsequences may be given names derived from the initial YAC name.  Example:  subsequences derived from the YAC Y47H9  have been called Y47H9A, Y47H9B, Y47H9C.  Note that physical clones corresponding to these subsequences are not available.

 

Genomic DNA clones that have not been generated by the consortium are usually designated by the laboratory strain designation (see below), a # symbol and an isolation number, e.g., MT#JAL6.

 

Sequences that are predicted to be genes from sequence data alone are initially named by the consortium on the basis of the sequenced cosmid, plus a number. For example, the genes predicted for the cosmid T05G3 are called T05G3.1, T05G3.2, etc. (numbered in arbitrary order of definition). Such names can be superseded by standard 3-letter names when this becomes appropriate. Thus, R13F6.3 has been given the name srg-12 (for serpentine receptor, class gamma).

 

EST (Expressed Sequence Tag) clones have received names with prefixes such as cm and yk.

 

 

RFLPs and SNPs

Polymorphic sites, which are mostly RFLPs (restriction fragment length polymorphisms) or SNPs (single nucleotide polymorphisms) , are designated by an italic letter P and an italic number, preceded by the allele prefix for the laboratory responsible for identifying the site.

Examples: stP17 and stP196 are RFLPs identified in the laboratory of R. H. Waterston ,

amP9  and amP15 are SNPs identified in the laboratory of K. Kornfeld.

 

Transgenes

Transformation of C. elegans with exogenous DNA by microinjection usually leads to the formation of a transmissible extrachromosomal array containing many copies of the introduced DNA.  Sometimes chromosomal integration of the introduced DNA can occur, or an existing extrachromosomal array can be integrated after irradiation of a transgenic line.

 

Extrachromosomal arrays are given italicized names consisting of the laboratory allele prefix, the two letters Ex, and a number.

 

Integrated transgenes are designated by italicized names consisting of the laboratory allele prefix, the two letters Is, and a number.

 

Both Ex and Is can optionally be followed by genotypic or molecular information describing the transgene, in square brackets. For example, eEx3 or eIs2 or stEx5[sup-7(st5) unc-22(+)].

 

Gene fusions incorporated in transgenes that consist of a C. elegans gene or part thereof fused to a reporter such as lacZ or GFP are indicated by the C. elegans  gene name followed by two colons and the reporter, all italicized:  pes-1::lacZ,  mab-9::GFP.

 

Genotypes

Mutants carrying more than one mutation are designated by sequentially listing mutant genes or mutations according to the left-right (= up-down) order on the genetic map. Different linkage groups are separated by a semicolon and given in the order I, II, III, IV, V, X, f. I-V are the five autosomes, X is the X chromosome, and f refers to free duplications or chromosomal fragments. For example: dpy-5(e61) I; bli-2(e768) II; unc-32(e189) III.

 

Heterozygotes, with allelic differences between chromosomes, are designated by separating mutations on the two homologous chromosomes with a slash. Where unambiguous, wild-type alleles can be designated by a plus sign alone, or even omitted. For example, dpy-5(e61) unc-13(+)/dpy-5(+) unc-13(e51) I can also be written dpy-5 +/+ unc-13 or dpy-5/unc-13.

 

Transposons

C. elegans transposons are called Tc1, Tc2, etc., where each number represents a different family. Transposon names are not italicized except when included in a genotype. Transposon insertions in genes are indicated by adding ::Tc to the relevant mutation name, as an optional descriptor. Thus, a mutation of the gene unc-54 called r293 is a Tc1 insertion, and can therefore be written unc-54(r293::Tc1).

 

Chromosomal Aberrations

Duplications (Dp) deficiencies (Df), inversions (In) and translocations (T) are known in C. elegans cytogenetics; these are given italicized names consisting of the laboratory mutation prefix, the relevant abbreviation, and a number, optionally followed by the affected linkage groups in parentheses (e.g., eT1(III;V), mnDp5(X;f), where f indicates a free duplication). Chromosomal balancers of unknown structure can be designated using the abbreviation C, e.g., mnC1(II).

 

The mitochondrial genome

The mitochondrial genotype of a worm can be expressed using the standard nomenclature, using M as the abbreviation for the mitochondrial linkage group.  The mitochondrial genotype is written as the last element in the genotype, following the nuclear genotype.   Heteroplasmic combinations, where mitochondria of different genotypes co-exist in the same cytoplasm, can be expressed using a double forward slash, //.  For example, "uaDf5//+".

 

Proteins

The protein product of a gene can be referred to by the relevant gene name, written in non-italic capitals, e.g., the protein encoded by unc-13 can be called UNC-13. Where more than one protein product is predicted for a gene (usually as a result of alternative message processing), the different proteins are distinguished by additional capital letters, e.g., TRA-1A, TRA-1B.

Mutant protein products can be named by the missense change, for example a mutant TRA-1A protein with a Pro to Leu change at codon 79 would be written: TRA-1A (P79L).

 

RNA molecules

Messenger RNA species can be written by using the protein product as a descriptor, for example TRA-1A mRNA, TRA-1B mRNA, in order to allow distinction between different splice variants.

 

Non-coding RNA species can be written using the gene name as a descriptor, for example lin-4 RNA.  Small RNA species derived from mir  genes (micro-RNAs) can be written miR-,  followed by a number corresponding to the mir gene.  Example:  miR-2 for the RNA derived from mir-2.

 

Phenotypes

Phenotypic characteristics can be described in words, e.g., dumpy animals or uncoordinated animals. If more convenient, a nonitalicized three-letter abbreviation, which usually corresponds to a gene name, may be used. The first letter of a phenotypic abbreviation is capitalized, e.g., Unc for uncoordinated, Dpy for dumpy. If necessary to distinguish among related but distinguishable phenotypes, the relevant gene number can be added, e.g., Unc-4 and Unc-13 to differentiate the distinct phenotypes produced by mutations in the two genes unc-4 and unc-13. Abbreviations that do not correspond to gene names can also be used, e.g., Muv for multiple vulval development.

 

A common and accepted convention, when comparing a mutant with the wild-type, is to use the prefix non- to refer to the wild-type phenotypes, for example, non-Lin (= wild type cell lineage) or Dpy non-Unc (= wild type with respect to movement, but dumpy with respect to body shape).

 

 

RNAi phenotypes

Animals in which an endogenous gene has been down-regulated by RNA interference (RNAi), after exposure to double-stranded RNA corresponding to that gene, can be referred to as mutants, using italicized RNAi as the mutation name.  Example: mog-4(RNAi).

 

Phenotypes induced by RNAi can be named using conventional mutant phenotype descriptors, such as Unc, Muv, Fem.   For high throughput RNAi screens, which may detect only conspicuous phenotypes, a limited set of about forty standard phenotype descriptors has been established (see list on WormBase).

 

 

Strains

A strain is a set of individuals of a particular genotype with the capacity to produce more individuals of the same genotype. Strains are given nonitalicized names consisting of two uppercase letters followed by a number. The strain letter prefixes refer to the laboratory of origin and are distinct from the mutation letter prefixes.

Examples: CB1833 is a strain of genotype dpy-5(e61) unc-13(e51), originally constructed by S. Brenner at the MRC Laboratory of Molecular Biology (strain prefix CB, allele prefix e),

and MT688 is a strain of genotype unc-32(e189) +/+ lin-12(n137) III; him-5(e1467) V, constructed in the laboratory of H.R. Horvitz at M.I.T. (strain prefix MT, allele prefix n).

 

Strain prefixes are listed at:  http://biosci.umn.edu/CGC/Nomenclature/code.htm.

 

Some 3-letter laboratory designations are also in use, mainly to refer to strains of nematode species other than C. elegans.

 

Strains can and should be preserved as frozen stocks at –70˚ C or ideally in liquid nitrogen, in order to ensure long-term maintenance and to avoid drift or accumulation of modifier mutations.

 

 

Sources

All genetic data for C. elegans are summarized in the ACeDB interactive database (Eeckman and Durbin, 1995 Methods in Cell Biology 48: 584-605) and WormBase.

 

Queries on recommended nomenclature for C. elegans should be addressed to:

 

J. Hodgkin

Genetics Unit, Department of Biochemistry, University of Oxford, Oxford OX1 3QU, UK

Tel  +44 1865 275317

Fax +44 1865 275318

email:  jah@bioch.ox.ac.uk

 

or

 

R. K. Herman

Caenorhabditis Genetics Center, University of Minnesota, 1445 Gortner Avenue, St. Paul, MN 55108, USA

Tel:  +1 612 624 6203

Fax: +1 612 625 5754

email: bob-h@biosci.cbs.umn.edu