Identification of Alternate Polyadenylation Sites and Analysis of their Tissue Distribution Using EST Data

  1. Emmanuel Beaudoing and
  2. Daniel Gautheret1
  1. Centre d'Immunologie de Marseille-Luminy, Institut National de la Santé et de la Recherche Médicale, Centre National de la Recherche Scientifique, Marseille Cedex 09, France

Abstract

Alternate polyadenylation affects a large fraction of higher eucaryote mRNAs, producing mature transcripts with 3′ ends of variable length. This variation is poorly represented in the current transcript catalogs derived from whole genome sequences, mostly because such posttranscriptional events are not detectable directly at the DNA level. Alternate polydenylation of an mRNA is better understood by comparision to EST databases. Comparing ESTs to mRNAs, however, is a difficult task subjected to the pitfalls of internal priming, presence of intron sequences, repeated elements, chimerical ESTs or matches with EST from paralogous genes. We present here a computer program that addresses these problems and displays ESTs matches to a query mRNA sequence to predict alternate polyadenylation and to suggest library-specific forms. The output highlights effective polyadenylation signals, possible sources of artifacts such as A-rich stretches in the mRNA sequences, and allows for a direct visualization of EST libraries using color codes. Statistical biases in the distribution of alternative mRNA forms among EST libraries were systematically sought. About 1450 human and 200 mouse mRNAs displayed such biases, suggesting in each case a tissue- or disease-specific regulation of polyadenylation.

Footnotes

  • 1 Corresponding author. E-MAIL ; FAX 33-491-82-8621. Article published on-line before print: Genome Res., 10.1101/gr. 190501.

  • Article and publication are at www.genome.org/cgi/doi/10.1101/gr.190501.

    • Received March 30, 2001.
    • Accepted June 12, 2001.
| Table of Contents

Preprint Server