Lineage-Specific Gene Expansions in Bacterial and Archaeal Genomes

  1. I. King Jordan1,
  2. Kira S. Makarova1,2,3,
  3. John L. Spouge1,
  4. Yuri I. Wolf1,3, and
  5. Eugene V. Koonin1,4
  1. 1National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA; 2Uniformed Services University of the Health Sciences, Bethesda, Maryland 20894, USA; 3Institute of Cytology and Genetics, Russian Academy of Sciences, Novisibirsk 630090, Russia

Abstract

Gene duplication is an important mechanistic antecedent to the evolution of new genes and novel biochemical functions. In an attempt to assess the contribution of gene duplication to genome evolution in archaea and bacteria, clusters of related genes that appear to have expanded subsequent to the diversification of the major prokaryotic lineages (lineage-specific expansions) were analyzed. Analysis of 21 completely sequenced prokaryotic genomes shows that lineage-specific expansions comprise a substantial fraction (∼5%–33%) of their coding capacities. A positive correlation exists between the fraction of the genes taken up by lineage-specific expansions and the total number of genes in a genome. Consistent with the notion that lineage-specific expansions are made up of relatively recently duplicated genes, >90% of the detected clusters consists of only two to four genes. The more common smaller clusters tend to include genes with higher pairwise similarity (as reflected by average score density) than larger clusters. Regardless of size, cluster members tend to be located more closely on bacterial chromosomes than expected by chance, which could reflect a history of tandem gene duplication. In addition to the small clusters, almost all genomes also contain rare large clusters of size ≥20. Several examples of the potential adaptive significance of these large clusters are explored. The presence or absence of clusters and their related genes was used as the basis for the construction of a similarity graph for completely sequenced prokaryotic genomes. The topology of the resulting graph seems to reflect a combined effect of common ancestry, horizontal transfer, and lineage-specific gene loss.

Footnotes

  • 4 Corresponding author.

  • E-MAIL koonin{at}ncbi.nlm.nih.gov; FAX (301) 480-9241.

  • Article published on-line before print: Genome Res., 10.1101/gr.166001.

  • Article and publication are at www.genome.org/cgi/doi/10.1101/gr.166001.

    • Received October 2, 2000.
    • Accepted January 9, 2001.
| Table of Contents

Preprint Server