RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins

  1. Noel G. Faux1,2,3,
  2. Gavin A. Huttley4,
  3. Khalid Mahmood1,2,3,
  4. Geoffrey I. Webb2,5,
  5. Maria Garcia de la Banda2,5,6, and
  6. James C. Whisstock1,2,3,6
  1. 1 Protein Crystallography Unit, Department of Biochemistry and Molecular Biology, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia;
  2. 2 Victorian Bioinformatics Consortium, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia;
  3. 3 ARC Centre for Structural and Functional Microbial Genomics, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia;
  4. 4 John Curtin School of Medical Research, Australian National University, Canberra, Australian National Territory 0200, Australia;
  5. 5 School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia

Abstract

Over 3% of human proteins contain single amino acid repeats (repeat-containing proteins, RCPs). Many repeats (homopeptides) localize to important proteins involved in transcription, and the expansion of certain repeats, in particular poly-Q and poly-A tracts, can also lead to the development of neurological diseases. Previous studies have suggested that the homopeptide makeup is a result of the presence of G+C-rich tracts in the encoding genes and that expansion occurs via replication slippage. Here, we have performed a large-scale genomic analysis of the variation of the genes encoding RCPs in 13 species and present these data in an online database (http://repeats.med.monash.edu.au/genetic_analysis/). This resource allows rapid comparison and analysis of RCPs, homopeptides, and their underlying genetic tracts across the eukaryotic species considered. We report three major findings. First, there is a bias for a small subset of codons being reiterated within homopeptides, and there is no G+C or A+T bias relative to the organism’s transcriptome. Second, single base pair transversions from the homocodon are unusually common and may represent a mechanism of reducing the rate of homopeptide mutations. Third, homopeptides that are conserved across different species lie within regions that are under stronger purifying selection in contrast to nonconserved homopeptides.

Footnotes

  • 6 Corresponding authors.

    6 E-mail Maria.GarciadelaBanda{at}infotech.monash.edu.au; fax 61 3 9905 4699.

    6 E-mail James.Whisstock{at}med.monash.edu.au; fax 61 3 9905 4699.

  • [Supplemental material is available online at www.genome.org.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6255407

    • Received January 2, 2007.
    • Accepted April 10, 2007.
  • Freely available online through the Genome Research Open Access option.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server