טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentKohen Refael
SubjectIdentifying Protein Recognition Elements on RNA by
Combining Sequence and Structural Information
DepartmentDepartment of Biology
Supervisor Professor Yael Mandel-Gutfreun
Full Thesis textFull thesis text - English Version


Abstract

RNA binding proteins (RBPs) are responsible for many processes in the cell which require the binding of the RBPs to RNA. In many cases RBPs recognize their RNA target in a specific manner. The specific recognition of the RNA by RBPs is usually based not only on the sequence of the RNA, but also on its structure. By employing high throughput binding assays it is possible to extract the RNA sequences that are likely to bind a specific RBP. Computational approaches can further be used to detect the preferred binding motifs of a RBP. Most available bioinformatics programs which aim to find enriched sequence motifs from high throughput data can detect the preferred sequence motifs of a given protein. However the majority of these motif-finding tools do not consider the secondary structure of the RNA, thus, currently there is no available tool which considers both sequence and structure as equal contributors to the motif.

The aim of this study was to develop a new computational approach for de-novo identification of motifs in RNA which contain both sequence and secondary structural information. As a first step the secondary structure of the RNA sequences is defined based on computational prediction or experimental evidence, when available, and the sequences are encoded by a new alphabet that described the identity and secondary structure of each of the bases. Further we employed two different motif search approaches to search for enriched motifs in the data which was translated to the new alphabet.

To validate the new approach and examine whether it can detect motifs with mutual sequence and structural information we first tested the methods on RBPs which their binding motif has been previously shown. To our satisfaction, in all examples tested we were able to obtain motifs which were similar in sequence to the motifs that were previously found. Moreover we found that most of the motifs were preferentially found in single strand conformation, as expected from RNA binding motifs, while some motifs were found to be partially double and partially single. Interestingly, in most cases we found that the combined sequence/structural motifs that we found were significantly more conserved than the motifs with sequence information only. Finally we used our method to predict the binding motif of a novel RBP, the Bicc1 protein.