Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis

  1. Alexander F. Schier1,4,8
  1. 1Department of Molecular and Cellular Biology (MCB), Harvard University, Cambridge, Massachusetts 02138, USA;
  2. 2The Bioinformatics Centre, Department of Biology and the Biotech, Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen DK-2200, Denmark;
  3. 3Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts 02139, USA;
  4. 4The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA;
  5. 5Department of Stem Cell and Regenerative Biology (SCRB), Harvard University, Cambridge, Massachusetts 02138, USA;
  6. 6Howard Hughes Medical Institute (HHMI), Chevy Chase, Maryland 20815, USA
    1. 7 These authors contributed equally to this work.

    Abstract

    Long noncoding RNAs (lncRNAs) comprise a diverse class of transcripts that structurally resemble mRNAs but do not encode proteins. Recent genome-wide studies in humans and the mouse have annotated lncRNAs expressed in cell lines and adult tissues, but a systematic analysis of lncRNAs expressed during vertebrate embryogenesis has been elusive. To identify lncRNAs with potential functions in vertebrate embryogenesis, we performed a time-series of RNA-seq experiments at eight stages during early zebrafish development. We reconstructed 56,535 high-confidence transcripts in 28,912 loci, recovering the vast majority of expressed RefSeq transcripts while identifying thousands of novel isoforms and expressed loci. We defined a stringent set of 1133 noncoding multi-exonic transcripts expressed during embryogenesis. These include long intergenic ncRNAs (lincRNAs), intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, and precursors for small RNAs (sRNAs). Zebrafish lncRNAs share many of the characteristics of their mammalian counterparts: relatively short length, low exon number, low expression, and conservation levels comparable to that of introns. Subsets of lncRNAs carry chromatin signatures characteristic of genes with developmental functions. The temporal expression profile of lncRNAs revealed two novel properties: lncRNAs are expressed in narrower time windows than are protein-coding genes and are specifically enriched in early-stage embryos. In addition, several lncRNAs show tissue-specific expression and distinct subcellular localization patterns. Integrative computational analyses associated individual lncRNAs with specific pathways and functions, ranging from cell cycle regulation to morphogenesis. Our study provides the first systematic identification of lncRNAs in a vertebrate embryo and forms the foundation for future genetic, genomic, and evolutionary studies.

    Footnotes

    • 8 Corresponding authors.

      E-mail pauli{at}fas.harvard.edu.

      E-mail schier{at}fas.harvard.edu.

      E-mail aregev{at}broad.mit.edu.

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.133009.111.

    • Received October 12, 2011.
    • Accepted November 21, 2011.
    | Table of Contents

    Preprint Server