A Cross-Genomic Approach for Systematic Mapping of Phenotypic Traits to Genes

  1. Kam Jim1,
  2. Kush Parmar2,
  3. Mona Singh1,3,4, and
  4. Saeed Tavazoie2,3,4
  1. 1 Department of Computer Science, Princeton University, Princeton, New Jersey, 08544 USA
  2. 2 Department of Molecular Biology, Princeton University, Princeton, New Jersey, 08544 USA
  3. 3 Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, 08544 USA

Abstract

We present a computational method for de novo identification of gene function using only cross-organismal distribution of phenotypic traits. Our approach assumes that proteins necessary for a set of phenotypic traits are preferentially conserved among organisms that share those traits. This method combines organism-to-phenotype associations,along with phylogenetic profiles,to identify proteins that have high propensities for the query phenotype; it does not require the use of any functional annotations for any proteins. We first present the statistical foundations of this approach and then apply it to a range of phenotypes to assess how its performance depends on the frequency and specificity of the phenotype. Our analysis shows that statistically significant associations are possible as long as the phenotype is neither extremely rare nor extremely common; results on the flagella,pili, thermophily,and respiratory tract tropism phenotypes suggest that reliable associations can be inferred when the phenotype does not arise from many alternate mechanisms.

Footnotes

  • [Supplemental material available online at www.genome.org.]

  • 5 In Figure 4, it is interesting to note that the organisms that exhibit flagella, yet have few homologs to the top 60 E. coli proteins in Table 2, are archaea.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1586704.

  • 4 Corresponding authors. E-MAIL msingh{at}cs.princeton.edu; FAX (609) 258-1771. E-MAIL tavazoie{at}molbio.princeton.edu; FAX (609) 258-1701.

    • Accepted October 29, 2003.
    • Received May 24, 2003.
| Table of Contents

Preprint Server