Ph.D Thesis

Ph.D StudentHaimov Boris
SubjectSimulation of Protein Folding with Reduced Representation
Based on Statistical Knowledge
DepartmentDepartment of Nanoscience and Nanotechnology
Supervisor ASSOCIATE PROF. Simcha Srebnik
Full Thesis textFull thesis text - English Version


The Computational Protein Folding Prediction (CPFP) problem is a most interesting challenge in structural bioinformatics. The challenge in CPFP is to computationally fold unfolded polypeptide chains into functional 3D proteins given their amino-acid sequences. According to Levinthal’s paradox, the time required to fold a polypeptide with 100 amino-acids exceeds the age of universe if it were a random process; nevertheless, proteins usually fold to their functional shapes within seconds. Levinthal’s paradox emphasizes the difficulty and the importance of CPFP.

CPFP includes approaches that represent systems down to atomistic detail with explicit water molecules, as well as models with reduced representation. Systems that are represented in full atomistic detail use fundamental force field equations for the simulation. Such approaches are slow and require expensive hardware. On the other hand, systems with reduced representation of proteins, implicit water molecules, and that make use of residue-residue interaction matrices are faster and affordable.

A notable example of successful protein folding in silico on a specially designed machine using fully atomistic detail was reported in 2011 by Shaw and coworkers. In their study, the authors managed to fold 12 structurally diverse proteins with lengths of less than 100 amino-acids. A couple of years later, in 2013, Sosnick and coworkers reported a successful attempt of folding similar proteins on a non-specialized computer within 600 CPU hours per protein by using a simplified protein representation.

Evidently, reduced representation is the key for efficient simulation; however, the universal force-field for simulations with reduced representation has not been yet defined. Force-fields for reduced representation must define correctly the energetic cost of interaction between residues, in some cases as interaction matrices and in other cases as hydrophobicity scales. Many different hydrophobicity scales have been reported in the literature; nevertheless, the universal force-field remains unknown.

This research approaches the CPFP problem by: (1) Defining an efficient representation of proteins to achieve better simulation performance, (2) Understanding hydrogen-bonding and residue-residue interactions that allow a correct association between native conformation and minimum energy conformation.

The results of this research show that: (1) By performing the simulation explicitly in bend roll and displacement (BRAD) space, a significant improvement of performance is achieved, (2) 20x20 residue-residue interaction matrices are not sufficient to define a force-field that associates properly between native folded conformations and minimum energy conformations. By taking into account the nearest chemically bonded neighbors of every amino-acid residue along the polypeptide chain, we get a matrix that defines interactions between one amino-acid triplet (203=8000) to another. The use of 8000x8000 matrices allowed defining a force-field that properly associates between native folded-conformation and minimum energy of the proteins used in this study.