Unique folding of precursor microRNAs: Quantitative evidence and implications for de novo identification

  1. Stanley NG Kwang Loong1,2 and
  2. Santosh K. Mishra1,2
  1. 1Bioinformatics Institute, Matrix, Singapore 138671
  2. 2NUS Graduate School for Integrative Sciences and Engineering, Centre for Life Sciences, Singapore 117456

Abstract

MicroRNAs (miRNAs) participate in diverse cellular and physiological processes through the post-transcriptional gene regulatory pathway. Hairpin is a crucial structural feature for the computational identification of precursor miRNAs (pre-miRs), as its formation is critically associated with the early stages of the mature miRNA biogenesis. Our incomplete knowledge about the number of miRNAs present in the genomes of vertebrates, worms, plants, and even viruses necessitates thorough understanding of their sequence motifs, hairpin structural characteristics, and topological descriptors. In this in-depth study, we investigate a comprehensive and heterogeneous collection of 2241 published (nonredundant) pre-miRs across 41 species (miRBase 8.2), 8494 pseudohairpins extracted from the human RefSeq genes, 12,387 (nonredundant) ncRNAs spanning 457 types (Rfam 7.0), 31 full-length mRNAs randomly selected from GenBank, and four sets of synthetically generated genomic background corresponding to each of the native RNA sequence. Our large-scale characterization analysis reveals that pre-miRs are significantly different from other types of ncRNAs, pseudohairpins, mRNAs, and genomic background according to the nonparametric Kruskal–Wallis ANOVA (p < 0.001). We examine the intrinsic and global features at the sequence, structural, and topological levels including %G+C content, normalized base-pairing propensity P(S), normalized minimum free energy of folding MFE(s), normalized Shannon entropy Q(s), normalized base-pair distance D(s), and degree of compactness F(S), as well as their corresponding Z scores of P(S), MFE(s), Q(s), D(s), and F(S). The findings will promote more accurate guidelines and distinctive criteria for the prediction of novel pre-miRs with improved performance.

Keywords

Footnotes

  • Reprint requests to: Stanley NG Kwang Loong, Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671; e-mail: stanley{at}bii.a-star.edu.sg; fax: +65-6478-9050.

  • Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.223807.

    • Received July 6, 2006.
    • Accepted November 7, 2006.
| Table of Contents