Decoding Human Regulatory Circuits

  1. William Thompson1,5,
  2. Michael J. Palumbo1,
  3. Wyeth W. Wasserman2,
  4. Jun S. Liu3, and
  5. Charles E. Lawrence1,4
  1. 1Center for Bioinformatics, The Wadsworth Center, New York State Department of Health, Albany, New York 12208, USA
  2. 2Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia V5Z 4H4, Canada
  3. 3Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, USA
  4. 4Computer Science Department, Rensselaer Polytechnic Institute, Troy, New York 12180, USA

Abstract

Clusters of transcription factor binding sites (TFBSs) which direct gene expression constitute cis-regulatory modules (CRMs). We present a novel algorithm, based on Gibbs sampling, which locates, de novo, the cis features of these CRMs, their component TFBSs, and the properties of their spatial distribution. The algorithm finds 69% of experimentally reported TFBSs and 85% of the CRMs in a reference data set of regions upstream of genes differentially expressed in skeletal muscle cells. A discriminant procedure based on the output of the model specifically discriminated regulatory sequences in muscle-specific genes in an independent test set. Application of the method to the analysis of 2710 10-kb fragments upstream of annotated human genes identified 17 novel candidate modules with a false discovery rate ≤0.05, demonstrating the applicability of the method to genome-scale data.

Footnotes

  • [Supplemental material is available online at www.genome.org.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2589004.

  • 5 Corresponding author. E-MAIL thompson{at}wadsworth.org; FAX (518) 402-4623.

    • Accepted July 22, 2004.
    • Received March 17, 2004.
| Table of Contents

Preprint Server