A sequence-based, deep learning model accurately predicts RNA splicing branchpoints

  1. Gill Bejerano1,2,3,4
  1. 1Department of Computer Science, Stanford University, Stanford, California 94305, USA
  2. 2Department of Developmental Biology, Stanford University, Stanford, California 94305, USA
  3. 3Department of Pediatrics, Stanford University, Stanford, California 94305, USA
  4. 4Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
  1. Corresponding authors: jpaggi{at}stanford.edu, bejerano{at}stanford.edu

Abstract

Experimental detection of RNA splicing branchpoints is difficult. To date, high-confidence experimental annotations exist for 18% of 3′ splice sites in the human genome. We develop a deep-learning-based branchpoint predictor, LaBranchoR, which predicts a correct branchpoint for at least 75% of 3′ splice sites genome-wide. Detailed analysis of cases in which our predicted branchpoint deviates from experimental data suggests a correct branchpoint is predicted in over 90% of cases. We use our predicted branchpoints to identify a novel sequence element upstream of branchpoints consistent with extended U2 snRNA base-pairing, show an association between weak branchpoints and alternative splicing, and explore the effects of genetic variants on branchpoints. We provide genome-wide branchpoint annotations and in silico mutagenesis scores at http://bejerano.stanford.edu/labranchor.

Keywords

  • Received March 14, 2018.
  • Accepted September 10, 2018.

This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

| Table of Contents