A Genome-Wide Survey of Human Pseudogenes

  1. David Torrents1,
  2. Mikita Suyama1,
  3. Evgeny Zdobnov, and
  4. Peer Bork2
  1. EMBL, Heidelberg 69117, Germany

Abstract

We screened all intergenic regions in the human genome to identify pseudogenes with a combination of homology searches and a functionality test using the ratio of silent to replacement nucleotide substitutions (KA/KS). We identified 19,724 regions of which 95% ± 3% are estimated to evolve neutrally and thus are likely to encode pseudogenes. Half of these have no detectable truncation in their pseudocoding regions and therefore are not identifiable by methods that require the presence of truncations to prove nonfunctionality. A comparative analysis with the mouse genome showed that 70% of these pseudogenes have a retrotranspositional origin (processed), and the rest arose by segmental duplication (nonprocessed). Although the spread of both types of pseudogenes correlates with chromosome size, nonprocessed pseudogenes appear to be enriched in regions with high gene density. It is likely that the human pseudogenes identified here represent only a small fraction of the total, which probably exceeds the number of genes.

Footnotes

  • [Supplemental information as well as the sequences identified in this work can be found at http://www.bork.embl-heidelberg.de/Docu/Human_Pseudogenes/.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1455503.

  • 1 These authors contributed equally to this work.

  • 2 Corresponding author. E-MAIL bork{at}embl-heidelberg.de; FAX 11-49-6221-387-517.

    • Accepted September 30, 2003.
    • Received April 24, 2003.
| Table of Contents

Preprint Server