טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentDror Iris
SubjectNew Computational Approaches to Study Protein-Nucleic Acid
Recognition
DepartmentDepartment of Biology
Supervisor Professor Yael Mandel-Gutfreun
Full Thesis textFull thesis text - English Version


Abstract

Protein-DNA recognition is a critical component of gene regulatory processes but the underlying molecular mechanisms governing protein-DNA interactions are not yet completely understood. In my thesis I focused on three aspects of protein-DNA recognition: predicting binding interfaces of NA-binding proteins, investigating different features of the DNA binding sites that contribute to transcription factors (TFs) specificity and studying the covariation between the proteins and their DNA binding sites.


In the first project we focused on nucleic acid-binding proteins, and attempted to predict binding interfaces from protein structures. We developed a pipeline for extracting functional electrostatic surface patches from structural models of proteins. By employing a combined patch approach we show that patches extracted from an ensemble of models better predict the real nucleic acid binding interfaces compared with patches extracted from independent models. This approach can be used for predicting nucleic acid binding interfaces for proteins that don’t have a solved structure.


Focusing on the DNA binding sites, we characterized different features that contribute to protein recognition. Genome-wide technologies identify the DNA binding sites of hundreds of TFs from which a consensus motif can be derived. Whereas the binding motif of many TFs has been characterized, it is still not clear what distinguishes a binding site containing a motif from a vast amount of regions in the genome containing a motif that does not bind the TF. Recent in vitro binding assays demonstrate selective binding of motifs for different TFs, raising the possibility that cognate binding sites have unique intrinsic properties and that the information determining TF binding specificity is also encoded in the DNA surrounding the consensus motif. We investigated this hypothesis by analysing the DNA sequence and shape surrounding the motifs of hundreds of TFs extracted from in vitro binding assays and in vivo ChIP-seq data.


Finally, a major goal in the field of protein-DNA interactions has been the identification of a recognition code which defines the preferred base pair interactions for key amino acids. During my studies I focused on the homeodomain family, which has been studied extensively. Although during the last years researchers have better defined the roles of individual residues in the recognition helix in homeodomain-DNA recognition, the role of the N-terminal tail in determining binding specificity remains poorly understood. We examined the correlations between the amino acid sequences of homeodomains and the corresponding DNA sequence and shape attributes of their binding sites and proposed a novel homeodomain shape recognition code that defines the preferred DNA shape for key amino acids in the N-terminal tail.