Word frequency analysis reveals enrichment of dinucleotide repeats on the human X chromosome and [GATA]n in the X escape region

  1. John A. McNeil,
  2. Kelly P. Smith,
  3. Lisa L. Hall, and
  4. Jeanne B. Lawrence1
  1. Department of Cell Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01655, USA

Abstract

Most of the human genome encodes neither protein nor known functional RNA, yet available approaches to seek meaningful information in the “noncoding” sequence are limited. The unique biology of the X chromosome, one of which is silenced in mammalian females, can yield clues into sequence motifs involved in chromosome packaging and function. Although autosomal chromatin has some capacity for inactivation, evidence indicates that sequences enriched on the X chromosome render it fully competent for silencing, except in specific regions that escape inactivation. Here we have used a linguistic approach by analyzing the frequency and distribution of nine base-pair genomic “words” throughout the human genome. Results identify previously unknown sequence differences on the human X chromosome. Notably, the dinucleotide repeats [AT]n, [AC]n, and [AG]n are significantly enriched across the X chromosome compared with autosomes. Moreover, a striking enrichment (>10-fold) of [GATA]n is revealed throughout the 10-Mb segment at Xp22 that escapes inactivation, and is confirmed by fluorescence in situ hybridization. A similar enrichment is found in other eutherian genomes. Our findings clearly demonstrate sequence differences relevant to the novel biology and evolution of the X chromosome. Furthermore, they implicate simple sequence repeats, linked to gene regulation and unusual DNA structures, in the regulation and formation of facultative heterochromatin. Results suggest a new paradigm whereby a regional escape from X inactivation is due to the presence of elements that prevent heterochromatinization, rather than the lack of other elements that promote it.

Footnotes

  • 1

    1 Corresponding author.

    1 E-mail jeanne.lawrence{at}umassmed.edu; fax (508) 856-5178.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online ahead of print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.4627606

    • Received August 30, 2005.
    • Accepted January 17, 2006.
| Table of Contents

Preprint Server