Ph.D Thesis

Ph.D StudentAnavy Leon
SubjectSynthetic DNA Libraries and their Applications
in Data Storage and Biological Assays
DepartmentDepartment of Computer Science
Supervisor ASSOCIATE PROFESSOR Zohar Yakhini
Full Thesis textFull thesis text - English Version


Synthetic biology, which is based on utilizing approaches from engineering to study and manipulate biological systems, lies in the meeting point of biology, biotechnology, computer science and other fields. Interdisciplinary and collaborative work in the field produced scientific breakthroughs including design and production of de-novo biological systems. Examples include systems for mass production of chemical compounds, high throughput assays, versatile sensors and biological computing devices.

Recently, two major developments had dramatically influenced the field. High throughput DNA synthesis technology enables the production of synthetic Oligo Libraries (OLs) used for various purposes. CRISPR genome editing, a Nobel winning discovery, has the potential to revolutionize genetic engineering and the treatment of genetic disorders. The work presented in this thesis is closely related to both CRISPR genome editing and synthetic OLs. I present a detailed descriptions of four completed research projects for which I was either a lead author or a co-advisor.

In one project we developed a novel coding scheme that utilizes composite DNA letters to increase the logical density of DNA based storage. We analyzed the theoretical properties of the coding scheme, and performed large scale molecular implementation to demonstrate its feasibility and explore its limitations and practical properties. In this implementation we obtained a 25% increase in logical density, as compared to state of the art systems. We investigated the potential effect of composite DNA on the cost of DNA based storage systems using an analytical cost model and simulations. 

In a collaboration with the Amit Lab from the Technion, we performed a systematic exploration of the gene regulatory elements of E. coli. Using OL based high throughput assays and advanced statistical models we identified sequence variants that control a novel gene silencing mechanism. 

In a collaboration with Eitan Yaakobi from the Technion we developed SOLQC, a software tool for quality control of OLs. SOLQC can be integrated in analysis pipelines of OL based projects and highlights OL error patterns, thus enabling troubleshooting and optimization. 

In a collaboration with the Handel Lab from BIU we developed CRISPECTOR, a tool for high sensitivity assessment of off-target CRISPR editing activity. CRISPECTOR uses machine learning to analyze NGS data and quantify editing activity. CRISPECTOR is especially useful in off-target sites with very low, but statistically significant, editing activity rates. It is also the first tool that supports the detection of translocation events, from a multiplex PCR assay.

All these research directions are likely to continue and produce more findings by using a combination of synthetic biology, algorithmics and statistics. In particular, cutting edge data science and machine learning will continue to be incorporated and in high throughput complex biological studies using synthetic DNA.