CDS Annotation in Full-Length cDNA Sequence

Masaaki Furuno; Takeya Kasukawa; Rintaro Saito; Jun Adachi; Harukazu Suzuki; Richard Baldarelli; Yoshihide Hayashizaki; Yasushi Okazaki

doi:10.1101/gr.1060303

CDS Annotation in Full-Length cDNA Sequence

¹Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
²Multimedia Development Center, Advanced Technology Development Department, NTT Software Corporation, Naka-ku, Yokohama, Kanagawa 231-8554, Japan
³Institute for Advanced Biosciences, Keio University, Tsuruoka-city, Yamagata, 997-0017, Japan
⁴Mouse Genome Informatics Group, The Jackson Laboratory, Bar Harbor, Maine 04609, USA
⁵Genome Science Laboratory, RIKEN, Hirosawa, Wako, Saitama 351-0198, Japan

Abstract

The identification of coding sequences (CDS) is an important step in the functional annotation of genes. CDS prediction for mammalian genes from genomic sequence is complicated by the vast abundance of intergenic sequence in the genome, and provides little information about how different parts of potential CDS regions are expressed. In contrast, mammalian gene CDS prediction from cDNA sequence offers obvious advantages, yet encounters a different set of complexities when performed on high-throughput cDNA (HTC) sequences, such as the set of 60,770 cDNAs isolated from full-length enriched libraries of the FANTOM2 project. We developed a CDS annotation strategy that uses a variety of different CDS prediction programs to annotate the CDS regions of FANTOM2 cDNAs. These include rsCDS, which uses sequence similarity to known proteins; ProCrest; Longest-ORF and Truncated-ORF, which are ab initio based predictors; and finally, DECODER and NCBI CDS predictor, which use a combination of both principles. Aided by graphical displays of these CDS prediction results in the context of other sequence similarity results for each cDNA, FANTOM2 CDS inspection by curators and follow-up quality control procedures resulted in high quality CDS predictions for a total of 14,345 FANTOM2 clones.

Footnotes

[Supplemental material is available online at www.genome.org.]
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1060303.
↵6 Corresponding author. E-MAIL rgscerg{at}gsc.riken.go.jp; FAX 81-45-503-9216.
- Accepted April 8, 2003.
- Received December 10, 2002.
Cold Spring Harbor Laboratory Press

CDS Annotation in Full-Length cDNA Sequence

Abstract

Footnotes

This Article

Article Category

Services

Citing Articles

Google Scholar

PubMed/NCBI

Share

Preprint Server

Current Issue

From the Cover