Interpretation of allele-specific chromatin accessibility using cell state–aware deep learning
- Zeynep Kalender Atak1,2,5,6,
- Ibrahim Ihsan Taskiran1,2,5,
- Jonas Demeulemeester1,2,3,
- Christopher Flerin1,2,
- David Mauduit1,2,
- Liesbeth Minnoye1,2,
- Gert Hulselmans1,2,
- Valerie Christiaens1,2,
- Ghanem-Elias Ghanem4,
- Jasper Wouters1,2 and
- Stein Aerts1,2
- 1VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium;
- 2KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium;
- 3Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, United Kingdom;
- 4Institut Jules Bordet, Université Libre de Bruxelles, 1000 Brussels, Belgium
-
↵5 These authors contributed equally to this work.
Abstract
Genomic sequence variation within enhancers and promoters can have a significant impact on the cellular state and phenotype. However, sifting through the millions of candidate variants in a personal genome or a cancer genome, to identify those that impact cis-regulatory function, remains a major challenge. Interpretation of noncoding genome variation benefits from explainable artificial intelligence to predict and interpret the impact of a mutation on gene regulation. Here we generate phased whole genomes with matched chromatin accessibility, histone modifications, and gene expression for 10 melanoma cell lines. We find that training a specialized deep learning model, called DeepMEL2, on melanoma chromatin accessibility data can capture the various regulatory programs of the melanocytic and mesenchymal-like melanoma cell states. This model outperforms motif-based variant scoring, as well as more generic deep learning models. We detect hundreds to thousands of allele-specific chromatin accessibility variants (ASCAVs) in each melanoma genome, of which 15%–20% can be explained by gains or losses of transcription factor binding sites. A considerable fraction of ASCAVs are caused by changes in AP-1 binding, as confirmed by matched ChIP-seq data to identify allele-specific binding of JUN and FOSL1. Finally, by augmenting the DeepMEL2 model with ChIP-seq data for GABPA, the TERT promoter mutation, as well as additional ETS motif gains, can be identified with high confidence. In conclusion, we present a new integrative genomics approach and a deep learning model to identify and interpret functional enhancer mutations with allelic imbalance of chromatin accessibility and gene expression.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.260851.120.
-
Freely available online through the Genome Research Open Access option.
- Received January 30, 2020.
- Accepted April 5, 2021.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.