טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentKligun Efrat
SubjectA Computational Approach to Study RNA Recognition by
Small Ligands and Proteins
DepartmentDepartment of Biology
Supervisor Professor Yael Mandel-Gutfreun
Full Thesis textFull thesis text - English Version


Abstract

RNA molecules can fold into many conformations, providing many potential pockets for binding small molecules as well as interfaces for protein binding. The interactions between RNA and other molecules play a key role in many biological processes in the gene expression pathway. The main goal of my research was to study RNA recognition by small ligands and proteins and to identify novel RNA-binding proteins (RBPs).

For studying RNA recognition by small ligands we extracted from the protein data bank (PDB) RNA ligand complexes from different RNA groups. We analysed the chemical, physical, structural and conformational properties of binding pockets around the ligand. Comparing the properties of ligand-binding pockets to the properties of computed pockets extracted from all available RNA structures, revealed that ligand-binding pockets, mainly the adaptive pockets, are characterized by unique properties, specifically enriched in rare conformations of the nucleobase  and the sugar pucker. Further, we demonstrate that nucleotides possessing the rare conformations are preferentially involved in direct interactions with the ligand.

Interactions between protein and RNA are mediated through a variety of RNA-binding protein domains, among them the highly abundant RNA recognition motif (RRM). Here we studied protein-RNA complexes from different RNA binding domain families solved by NMR and x-ray crystallography. Characterizing the structural properties of the RNA at the binding interfaces revealed an unexpected number of nucleotides with unusual RNA conformations, specifically found in RNA-RRM complexes. Moreover, we observed that the RNA nucleotides that are directly involved in interactions with the RRM domains, via hydrogen bonds and hydrophobic contacts, are significantly enriched with unique RNA conformations.

Only a small fraction of RBPs have been fully characterized and the current assumption is that the majority of RBPs, many which do not share sequence or structural homology with known RBPs, are yet to be discovered. Here, we developed a structural-based computational method for predicting RBPs from structural models of proteins which are predicted from sequence, without relying on homology. We applied a machine learning approach for classifying RNA-binding proteins based on features extracted from their primary and predicted three-dimensional structures. The method was trained on a set of known RBPs and then tested on a set of novel RBPs with no recognizable RNA-binding domains, from a set of novel human and mouse RNA binding proteins that were derived from high-throughput interactome capture experiments. We showed that our method successfully differentiate between RBPs and non nucleic acid binding proteins (NNBPs) with area under the ROC curve (AUC) of 0.76. In addition, we tried to differentiate between RBPs and DNA binding proteins. We took for testing DNA binding proteins from a set of novel human DNA binding proteins identified by protein microarray assays and applied a machine learning approach. In the latter case the performance was significantly lower with an AUC of 0.67, nevertheless, this is the first successful attempt to separate DNA from RNA binding proteins without relying on homology.