Computational Identification of Operons in Microbial Genomes

  1. Yu Zheng1,
  2. Joseph D. Szustakowski2,
  3. Lance Fortnow3,
  4. Richard J. Roberts4, and
  5. Simon Kasif1,2,5
  1. 1Bioinformatics Graduate Program, Boston University, Boston, Massachusetts 02215, USA; 2Biomedical Engineering Department, Boston University, Boston, Massachusetts 02215, USA; 3NEC Research Institute, Inc., Princeton, New Jersey 08540, USA; 4New England BioLabs, Beverly, Massachusetts 01915, USA

Abstract

By applying graph representations to biochemical pathways, a new computational pipeline is proposed to find potential operons in microbial genomes. The algorithm relies on the fact that enzyme genes in operons tend to catalyze successive reactions in metabolic pathways. We applied this algorithm to 42 microbial genomes to identify putative operon structures. The predicted operons from Escherichia coliwere compared with a selected metabolism-related operon dataset from the RegulonDB database, yielding a prediction sensitivity (89%) and specificity (87%) relative to this dataset. Several examples of detected operons are given and analyzed. Modular gene cluster transfer and operon fusion are observed. A further use of predicted operon data to assign function to putative genes was suggested and, as an example, a previous putative gene (MJ1604) from Methanococcus jannaschii is now annotated as a phosphofructokinase, which was regarded previously as a missing enzyme in this organism. GC content changes in the operon region and nonoperon region were examined. The results reveal a clear GC content transition at the boundaries of putative operons. We looked further into the conservation of operons across genomes. A trp operon alignment is analyzed in depth to show gene loss and rearrangement in different organisms during operon evolution.

Footnotes

  • 5 Corresponding author.

  • E-MAIL kasif{at}bu.edu; FAX (617) 353-6766.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.200602.

    • Received February 21, 2002.
    • Accepted June 12, 2002.
| Table of Contents

Preprint Server