M.Sc Thesis

M.Sc StudentDavid Ya'ara
SubjectAnalysis of DNA Time of Replication - A Signal Processing
DepartmentDepartment of Electrical and Computers Engineering
Supervisors ASSOCIATE PROF. Zohar Yakhini
PROF. Yonina Eldar
Full Thesis textFull thesis text - English Version


Replication of the genome in a dividing cell is a highly regulated process that takes place during a stage in the cell-cycle which is called the S-phase and can take several hours, depending on the organism. Generally speaking, each genomic locus is replicated at a specific time within the S-phase - its time of replication (ToR). It was recently shown that the replication structure of the genome consists of two types of regions: regions which are densely populated with origins of replication activated approximately at the same time and have constant ToR, and large replicons that do not contain active origins. In the latter the ToR changes gradually. ToR can be measured by hybridizing labeled DNA from S-phase cells on a DNA microarray, and comparing against DNA in non-replicating cells. In this work we present an algorithm called ARTO - Analysis of Replication Timing and Organization - that performs a piecewise-linear continuous segmentation of raw ToR measurements. The first step of ARTO is finding potential lines in a signal window using a modification of the Hough transform - a common computer vision technique. Based on these lines, we compute an approximation to the best segmentation in the window, using a dynamic programming procedure. ARTO outputs, for each genomic location, an estimate of its ToR as well as an assignment to a type of replication structure, which is its major benefit. Using synthetic simulations we show that ARTO produces accurate results in both ToR estimation and replication structure assignment. Analysis of the results of applying our algorithm to six tissue types in human and mouse reveal correlation between ToR and genomic features such as open chromatin and association to the nuclear envelope. More importantly, our algorithm enables, for the first time, a genome-wide statistical analysis of the properties of the two types of genomic regions of replication. We found that the large replicons that do not contain active origins are associated with close chromatin environment, whereas transcribed regions, which are usually located in an open chromatin environment, have a tendency to reside in regions with several active origins. These findings may shed a light on the ToR regulation mechanism.