A sequence-based, deep learning model accurately predicts RNA splicing branchpoints

Joseph M. Paggi; Gill Bejerano

doi:10.1261/rna.066290.118

A sequence-based, deep learning model accurately predicts RNA splicing branchpoints

Joseph M. Paggi1 and
Gill Bejerano1,2,3,4

¹Department of Computer Science, Stanford University, Stanford, California 94305, USA
²Department of Developmental Biology, Stanford University, Stanford, California 94305, USA
³Department of Pediatrics, Stanford University, Stanford, California 94305, USA
⁴Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA

Corresponding authors: jpaggi{at}stanford.edu, bejerano{at}stanford.edu

Abstract

Experimental detection of RNA splicing branchpoints is difficult. To date, high-confidence experimental annotations exist for 18% of 3′ splice sites in the human genome. We develop a deep-learning-based branchpoint predictor, LaBranchoR, which predicts a correct branchpoint for at least 75% of 3′ splice sites genome-wide. Detailed analysis of cases in which our predicted branchpoint deviates from experimental data suggests a correct branchpoint is predicted in over 90% of cases. We use our predicted branchpoints to identify a novel sequence element upstream of branchpoints consistent with extended U2 snRNA base-pairing, show an association between weak branchpoints and alternative splicing, and explore the effects of genetic variants on branchpoints. We provide genome-wide branchpoint annotations and in silico mutagenesis scores at http://bejerano.stanford.edu/labranchor.

Keywords

Footnotes

Article is online at http://www.rnajournal.org/cgi/doi/10.1261/rna.066290.118.

Received March 14, 2018.
Accepted September 10, 2018.

This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.