SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

  1. Thomas Wiehe1,3,
  2. Steffi Gebauer-Jung1,
  3. Thomas Mitchell-Olds1, and
  4. Roderic Guigó2
  1. 1Max Planck Institute for Chemical Ecology, Jena, Germany; 2Grup de Recerca en Informàtica Biomèdica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, Barcelona, Spain

Abstract

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.

Footnotes

  • 3 Corresponding author.

  • E-MAIL twiehe{at}ice.mpg.de; FAX 49-3641-64-3668.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.177401.

    • Received January 1, 2001.
    • Accepted June 5, 2001.
| Table of Contents

Preprint Server