PRISM offers a comprehensive genomic approach to transcription factor function prediction
- Aaron M. Wenger1,
- Shoa L. Clarke2,
- Harendra Guturu3,
- Jenny Chen4,
- Bruce T. Schaar5,
- Cory Y. McLean1 and
- Gill Bejerano1,5,6
- 1Department of Computer Science, Stanford University, Stanford, California 94305, USA;
- 2Department of Genetics, Stanford University, Stanford, California 94305, USA;
- 3Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA;
- 4Biomedical Informatics Program, Stanford University, Stanford, California 94305, USA;
- 5Department of Developmental Biology, Stanford University, Stanford, California 94305, USA
Abstract
The human genome encodes 1500–2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.
Footnotes
-
↵6 Corresponding author
E-mail bejerano{at}stanford.edu
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.139071.112.
Freely available online through the Genome Research Open Access option.
- Received February 12, 2012.
- Accepted January 25, 2013.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.