A comprehensive catalog of human KRAB-associated zinc finger genes: Insights into the evolutionary history of a large family of transcriptional repressors

  1. Stuart Huntley1,
  2. Daniel M. Baggott1,
  3. Aaron T. Hamilton1,
  4. Mary Tran-Gyamfi1,
  5. Shan Yang1,
  6. Joomyeong Kim3,
  7. Laurie Gordon1,
  8. Elbert Branscomb2, and
  9. Lisa Stubbs1,4
  1. 1 Genome Biology
  2. 2 Microbial Systems Divisions, Biosciences, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
  1. 3

    3 Present address: Louisiana State University, Baton Rouge, LA.

Abstract

Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets.

Footnotes

  • 4

    4 Corresponding author.

    4 E-mail stubbs5{at}llnl.gov; fax (925) 422-2099.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4842106

    • Received October 21, 2005.
    • Accepted March 6, 2006.
| Table of Contents

Preprint Server