iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data

  1. Michael Brudno1,2,3,7,8
  1. 1Department of Computer Science, University of Toronto, Ontario M5S 2E4, Canada;
  2. 2Centre for Computational Medicine, Hospital for Sick Children, Toronto M5G 1L7, Canada;
  3. 3Genetics and Genome Biology, Hospital for Sick Children, Toronto M5G 1L7, Canada;
  4. 4Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599, USA;
  5. 5Department of Molecular Oncology, BC Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada;
  6. 6Department of Pathology, University of British Columbia, Vancouver, British Columbia V6T 2B5, Canada;
  7. 7Donnelly Centre, University of Toronto, Ontario M5S 3E1, Canada

    Abstract

    High-throughput RNA sequencing (RNA-seq) promises to revolutionize our understanding of genes and their role in human disease by characterizing the RNA content of tissues and cells. The realization of this promise, however, is conditional on the development of effective computational methods for the identification and quantification of transcripts from incomplete and noisy data. In this article, we introduce iReckon, a method for simultaneous determination of the isoforms and estimation of their abundances. Our probabilistic approach incorporates multiple biological and technical phenomena, including novel isoforms, intron retention, unspliced pre-mRNA, PCR amplification biases, and multimapped reads. iReckon utilizes regularized expectation-maximization to accurately estimate the abundances of known and novel isoforms. Our results on simulated and real data demonstrate a superior ability to discover novel isoforms with a significantly reduced number of false-positive predictions, and our abundance accuracy prediction outmatches that of other state-of-the-art tools. Furthermore, we have applied iReckon to two cancer transcriptome data sets, a triple-negative breast cancer patient sample and the MCF7 breast cancer cell line, and show that iReckon is able to reconstruct the complex splicing changes that were not previously identified. QT-PCR validations of the isoforms detected in the MCF7 cell line confirmed all of iReckon's predictions and also showed strong agreement (r2 = 0.94) with the predicted abundances.

    Footnotes

    • Received April 24, 2012.
    • Accepted November 21, 2012.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.

    | Table of Contents

    Preprint Server