|Ph.D Student||Shazman Shula|
|Subject||Computational Approaches for Characterizing|
|Department||Department of Biology||Supervisor||Professor Yael Mandel-Gutfreun|
|Full Thesis text|
Protein structure could provide new insights into the biological function of a protein and could help design better experiments to learn its biological roles. Moreover, deciphering the interactions of a protein with other molecules could contribute to the understanding of protein function within the cell. During my PhD studies, I applied a machine learning approach for predicting and classifying RNA-binding proteins from their three-dimensional structures. The method is based on characterizing unique properties of electrostatic patches on the protein surface. Using an ensemble of general protein features and specific properties extracted from the electrostatic patches, a support vector machine (SVM) was trained successfully to differentiate RNA-binding proteins from other positively-charged proteins that do not bind nucleic acids. Furthermore, by applying a multi-class SVM we were able to classify automatically the RNA-binding proteins based on their RNA target, for example, whether it binds a ribosomal RNA, a transfer RNA or a messenger RNA. However, the method was unable to differentiate RNA-binding proteins from DNA-binding proteins.
DNA- and RNA-binding proteins are expected to differ in their structural properties, consistent with the different properties of their natural ligand; DNA usually adopts a classical B-form double-helix, while RNA adopts A-form helices frequently interrupted by internal loops and bulges. The geometry and shape of the binding interface are suggested to play important roles in the specific recognition of their unique partner; however, currently there is no successful method for differentiating DNA- from RNA-binding interfaces. During my PhD studies, I focused on a differential geometry method to characterize the difference between DNA- and RNA-binding interfaces. Differential geometry has been used commonly until now in object recognition applications, such as three-dimensional (3D) facial recognition. In our study, by exploiting differential geometry we describe the molecular surface as a distribution of local surface shapes. The uniqueness of the current method is that it can describe any molecular surface shape regardless of ligand information, and it is not limited to protein-ligand interactions that follow the “lock and key” paradigm. Combining geometric and electrostatic properties with a machine learning approach trained on experimentally solved three-dimensional structures of proteins, we succeeded in differentiating between double-stranded DNA and single-stranded RNA-binding proteins with 83% accuracy. Finally, we propose that the differential geometry approach employed here for NA-binding interface prediction will be applicable for many other molecular recognition problems and specifically in drug design.