Prediction of RNA binding sites in proteins from amino acid sequence

  1. Michael Terribilini1,2,
  2. Jae-Hyung Lee1,2,
  3. Changhui Yan3,
  4. Robert L. Jernigan1,4,5,
  5. Vasant Honavar1,3,5,6, and
  6. Drena Dobbs1,2,5,7
  1. 1Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, Iowa 50010, USA
  2. 2Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, Iowa 50010, USA
  3. 3Department of Computer Science, Utah State University, Logan, Utah 84341, USA
  4. 4Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa 50010, USA
  5. 5Laurence H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, Iowa 50010, USA
  6. 6Department of Computer Science, Iowa State University, Ames, Iowa 50010, USA
  7. 7Center for Computational Intelligence, Learning, and Discovery, Iowa State University, Ames, Iowa 50010, USA

Abstract

RNA–protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA–protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA–protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA–protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA–protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.)

Keywords

Footnotes

  • Reprint requests to: Michael Terribilini, Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, Iowa 50010, USA; e-mail: terrible{at}iastate.edu; fax: (515) 294-6790.

  • Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.2197306.

    • Received August 18, 2005.
    • Accepted May 13, 2006.
| Table of Contents