Tandem Repeats in Protein Coding Regions of Primate Genes

  1. Branko Borštnik1,2 and
  2. Danilo Pumpernik1
  1. 1National Institute of Chemistry, SI-1001 Ljubljana, Slovenia

Abstract

Tandem repeats in GenBank primate nucleotide sequences annotated as protein coding regions are analyzed. It is found that only trinucleotide repeats show repeat enrichment well above the threshold of statistical significance. The statistics are improved by a simultaneous search for repeats on both the amino acid and nucleotide levels. The results of the analyses of natural sequences are interpreted by comparing them with the results of the computer simulation of the model dedicated to protein coding regions. According to the simulation results, a limited set of trinucleotides, that is, cgg, ccg, cag, and gaa repeats coding for polyalanine, polyglycine, polyproline, polyglutamine, and polylysine are prone to proliferation. It is also found that within the repeat regions slippage is more frequent by a factor of 10 than point mutations, whereas the ratio of silent versus recognizable point mutations is approximately the same as elsewhere in coding regions. The trinucleotide repeats cover slightly more than 0.3% of the protein coding regions of genes.

Footnotes

  • 2 Corresponding author.

  • E-MAIL branko{at}hp10.ki.si; FAX (386-1) 4760-300.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.138802.

    • Received August 7, 2001.
    • Accepted March 25, 2002.
| Table of Contents

Preprint Server