A Strategy to Retrieve the Whole Set of Protein Modules in Microbial Proteomes

  1. Stéphanie Le Bouder-Langevin1,
  2. Isabelle Capron-Montaland2,
  3. Renaud De Rosa3, and
  4. Bernard Labedan
  1. Évolution Moléculaire et Génomique, Institut de Génétique et Microbiologie, Université Paris-Sud, 91405 Orsay Cedex, France

Abstract

Protein homology is often limited to long structural segments that we have previously called modules. We describe here a suite of programs used to catalog the whole set of modules present in microbial proteomes. First, the Darwin AllAll program detects homologous segments using thresholds for evolutionary distance and alignment length, and another program classifies these modules. After assembling these homologous modules in families, we further group families which are related by a chain of neighboring unrelated homologous modules. With the automatic analysis of these groups of families sharing homologous modules in independent multimodular proteins, one can split into their component parts many fused modules and/or deduce by logic more distant modules. All detected and inferred modules are reassembled in refined families. These two last steps are made by a unique program. Eventually, the soundness of the data obtained by this experimental approach is checked using independent tests. To illustrate this modular approach, we compared four proteobacterial proteomes (Campylobacter jejuni, Escherichia coli,Haemophilus influenzae, and Helicobacter pylori). It appears that this method might retrieve from present-day proteins many of the modules which can help to trace back ancient events of gene duplication and/or fusion.

Footnotes

  • 1 Present address: ValiGen, Tour Neptune, 92086 Paris-La-Défense, France

  • 2 Present address: UMR 144 CNRS/Institut Curie, Bâtiment Lhomond, 26, rue d'Ulm, 75248 Paris Cedex 05, France

  • 3 Present address: Centre de Génétique Moléculaire, CNRS - Bâtiment 26, 91110 Gif-sur-Yvette, France

  • 4 Corresponding author.

  • E-MAIL labedan{at}igmors.u-psud.fr; FAX 33 (0)1 6915-78 08.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.393902.

    • Received May 5, 2002.
    • Accepted September 30, 2002.
| Table of Contents

Preprint Server