Ph.D Thesis

Ph.D StudentGrimberg Noam
SubjectRevealing Novel Glycoside Hydrolases by Functional and
Contextual Metagenomics
DepartmentDepartment of Energy
Supervisor PROF. Yuval Shoham
Full Thesis textFull thesis text - English Version


The purpose of this study was to design new methods for function- and sequence-based screening of metagenomics libraries for novel glycoside hydrolases (GHs). In particular, the methods were designed for identifying cellulases and hemi-cellulases that can be utilized for the production process of bioethanol from plant cell wall biomass.

The functional assay was designed to be applicable in a high throughput automatic system. For this, several reducing sugars assays were evaluated based on two parameters, specificity and sensitivity. The bicinchoninic acid (BCA) assay was chosen, as it gave the highest signal-to-noise ratio along with the highest sensitivity. However, the assay required incubation at high temperatures that resulted in evaporation losses due to evaporation and thus, prevented applying this screening assay to the high throughput robotic system.

In the main part of this study, a novel sequence based screening algorithm was designed by assimilating three approaches: 1) Designated annotation for biomass degradation pathways associated proteins. 2) The Genomic Island Mapper (GIM), a computer-designed algorithm assimilating the genomic neighborhood approach for the discovery of unknown proteins in biomass degradation associated genomic islands. 3) A genomic conservation approach for the evaluation of putative novel GHs.

Following the scan of 75 thermal-springs metagenomes, 97 DNA scaffolds with 778 hemicellulolytic genes have been identified. Out of the 97 DNA scaffolds, two genomic islands associated with plant cell wall degradation pathways had been identified and taken for further analysis. The two genomic islands, genomic island 1 (GI1) and genomic island 2 (GI2), contained 32 proteins; among them are GHs, ABC transporter systems with putative carbohydrate recognition modules and two unknown proteins.  The two unknown proteins exhibited genomic neighborhood conservation and thus were considered as putative novel GHs (nGH_127 and nGH_363).  Phylogenetic analysis of the solute binding proteins of the ABC transporter systems from both genomic islands suggest that they are sugar-binding proteins. Two novel thermostable cellulases were recognized and characterized biochemically; Cel44N is a novel GH44 endo-cellulase originated from GI1 with an enhanced activity on carboxymethyl cellulose (CMC) as well as on phosphoric acid-swollen cellulose and xylans; Cel359 is a novel GH5 cellulase originated from GI2 was found to be active on CMC and on several types of xylan. The cellulolytic activity of Cel44N and Cel359 along with the binding properties of the ABC systems modules indicate that GI1 and GI2 encode for cell wall polysaccharides degradation pathways.

The Genomic Island Mapper algorithm was operated on 75 thermal-springs metagenomes containing 2,843,368 DNA fragments with 4,489,353 protein-coding genes. Performance analysis of our algorithm found that the designated annotation had reduced the analyzed data by ~80% yielded 480,843 hits on biomass degradation pathways associated proteins. The Genomic Island Mapper algorithm had recognized 97 DNA scaffolds with 778 hemicellulolytic genes thus,  the assimilation of the genomic neighborhood approach in the algorithm had reduce the putative novel enzyme candidates by ~90%. The genomic neighborhood conservation evaluation of the putative novel enzymes was found to be an essential step prior to biochemical characterization.