טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentModai Hod Ronit
SubjectIdentifying Functional Sites in Proteins Using a Multilevel
Alphabet
DepartmentDepartment of Biology
Supervisor Professor Yael Mandel-Gutfreun
Full Thesis textFull thesis text - English Version


Abstract

The protein sequence dictates its structure and the function.  In the process of evolution proteins accumulate changes allowing the divergence of species and individuals. Within the protein sequence different positions evolve at varying rates. While some are highly conserved, others tend to diverge extensively. Thus the evolutionary rate of individual residues in proteins serves as the basis for prediction of protein functional sites, assuming that these tend to be highly conserved. While sequence conservation is the basis for many prediction methods, additional information on structural conservation has proven to contribute to function prediction.

In my thesis, I present a novel Multi-Level Alphabet (MLA), which we employed to search for functional sites in proteins which are conserved both at the  physicochemical level and the structural properties, looking simultaneously at the  different features. Using a motif search algorithm, we identified the proteins signatures within subsets of protein which encompass common sequence and structural information. We demonstrated that we can detect enriched structural motifs, such as the amphipathic helix, from large datasets of linear sequences, as well as to predict common structural properties of known functional-motifs. We have also applied the method to the yeast protein intercatome and identified novel putative interacting motifs.

Further, we demonstrated that MLA conserved residues are correlated with functional positions in proteins such as hot spots, disease related mutations and cancer related mutations. While the correlation between functional sites and conserved residues at the MLA level was higher than the correlation with amino acid (AA) conserved residues, the union of conserved MLA residues and conserved AA residues significantly improved the detection of different types of functional sites in proteins. The low intersection between the conserved populations, which were detected by the different alphabets, point out to the different evolutionary pressure on individual positions in proteins.

Consequently, we derived a substitution matrix reflecting the transition cost between the letters of the multilevel alphabet. By employing a standard similarity search algorithm with the new substitution matrix we succeeded to assign structural classification and predict function of remote homologous proteins.

Taken together, we introduce a new multilevel alphabet for protein representation, which combines sequence and structural information.  The MLA can be applied for detecting enriched short motifs in proteins, identifying novel functional sites and de-novo protein function prediction, given either the sequence or the protein structural information as an input.