Interpretation of allele-specific chromatin accessibility using cell state–aware deep learning

  1. Stein Aerts1,2
  1. 1VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium;
  2. 2KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium;
  3. 3Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, United Kingdom;
  4. 4Institut Jules Bordet, Université Libre de Bruxelles, 1000 Brussels, Belgium
  1. 5 These authors contributed equally to this work.

  • 6 Present address: Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK

  • Corresponding author: stein.aerts{at}kuleuven.vib.be
  • Abstract

    Genomic sequence variation within enhancers and promoters can have a significant impact on the cellular state and phenotype. However, sifting through the millions of candidate variants in a personal genome or a cancer genome, to identify those that impact cis-regulatory function, remains a major challenge. Interpretation of noncoding genome variation benefits from explainable artificial intelligence to predict and interpret the impact of a mutation on gene regulation. Here we generate phased whole genomes with matched chromatin accessibility, histone modifications, and gene expression for 10 melanoma cell lines. We find that training a specialized deep learning model, called DeepMEL2, on melanoma chromatin accessibility data can capture the various regulatory programs of the melanocytic and mesenchymal-like melanoma cell states. This model outperforms motif-based variant scoring, as well as more generic deep learning models. We detect hundreds to thousands of allele-specific chromatin accessibility variants (ASCAVs) in each melanoma genome, of which 15%–20% can be explained by gains or losses of transcription factor binding sites. A considerable fraction of ASCAVs are caused by changes in AP-1 binding, as confirmed by matched ChIP-seq data to identify allele-specific binding of JUN and FOSL1. Finally, by augmenting the DeepMEL2 model with ChIP-seq data for GABPA, the TERT promoter mutation, as well as additional ETS motif gains, can be identified with high confidence. In conclusion, we present a new integrative genomics approach and a deep learning model to identify and interpret functional enhancer mutations with allelic imbalance of chromatin accessibility and gene expression.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.260851.120.

    • Freely available online through the Genome Research Open Access option.

    • Received January 30, 2020.
    • Accepted April 5, 2021.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    Related Article

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server