Predicting Gene Function From Patterns of Annotation

  1. Oliver D. King1,
  2. Rebecca E. Foulger2,
  3. Selina S. Dwight3,
  4. James V. White4,5, and
  5. Frederick P. Roth1,6
  1. 1Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA; 2FlyBase, Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, England; 3Department of Genetics, Stanford University School of Medicine, Stanford, California 94305-5120, USA; 4JVWhite.Com, Cambridge, Massachusetts 02139, USA; 5Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215, USA

Abstract

The Gene Ontology (GO) Consortium has produced a controlled vocabulary for annotation of gene function that is used in many organism-specific gene annotation databases. This allows the prediction of gene function based on patterns of annotation. For example, if annotations for two attributes tend to occur together in a database, then a gene holding one attribute is likely to hold the other as well. We modeled the relationships among GO attributes with decision trees and Bayesian networks, using the annotations in theSaccharomyces Genome Database (SGD) and in FlyBase as training data. We tested the models using cross-validation, and we manually assessed 100 gene–attribute associations that were predicted by the models but that were not present in the SGD or FlyBase databases. Of the 100 manually assessed associations, 41 were judged to be true, and another 42 were judged to be plausible.

[Detailed lists of hypotheses including the curators' comments on each hypothesis, are available at http://llama.med.harvard.edu/∼king/predictions.html.]

Footnotes

  • 6 Corresponding author.

  • E-MAIL froth{at}hms.Harvard.edu; FAX (617) 432-3557.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.440803. Article published online before print in April 2003.

    • Received May 18, 2002.
    • Accepted February 13, 2003.
| Table of Contents

Preprint Server