Incognito rRNA and rDNA in databases and libraries.

Abstract

Both ribosomal DNA (rDNA) and ribosomal RNA (rRNA) are over-represented in the starting material for genomic and cDNA libraries; thus, their sequences have the potential of repeatedly entering the various databases. When DNA (both transcribed and intergenic spacer regions) is used as query sequence, a great number of matches are found in the databases, particularly in the EST database, and to a lesser extent among genomic sequences and STSs, which are not identified as rDNA. We discuss the following explanations for the widespread occurrence of rDNA in cDNA and genomic DNA libraries: pseudogenes of rRNA in other genomic locations, mRNA-derived pseudogenes that reside in rDNA, cDNAs derived from rRNA [either by self-priming or by internal oligo(dT) priming], cDNAs derived from actual transcripts of the rDNA intergenic spacer, and genomic DNA contamination of RNA preparations. Because so many database entries contain unidentified rDNA, we recommend that all sequence submissions be checked (by the submitters) for the presence of structural RNAs in addition to repetitive sequences.

Footnotes

| Table of Contents

Preprint Server