M.Sc Student | Sabary Omer |
---|---|

Subject | Reconstruction Algorithms for DNA-Based Storage Systems |

Department | Department of Computer Science |

Supervisor | Professor Eitan Yaakobi |

Full Thesis text |

In the ** trace
reconstruction problem** a length-n string

Motivated by
the storage channel of DNA, this work is focused on another variation of the
trace reconstruction problem, which is referred by the ** DNA reconstruction
problem**. The setup is similar to the trace reconstruction problem. A length-n
string

This work consists of three parts. The first part presents a software tool to support quality control of synthetic DNA libraries. This part includes a comprehensive study and characterization of the errors in the DNA storage channel.

The second part studies the maximum likelihood (ML) decoder for two deletion channels. In this part, we calculate the error and the failure probability of the ML decoder, and also simulate it to verify our results.

In the third part we present several new
algorithms for the DNA reconstruction problem and for the deletion DNA
reconstruction problem. While most of the previous algorithms look locally on
each symbol and use a symbol-wise majority technique, our algorithms look
globally on the entire sequence of the traces and use dynamic programming
algorithms to estimate the original sequence. This global approach builds upon
the algorithms to find the ** shortest common supersequence **and the

Lastly, we tested our algorithms on simulated data and on data from previous DNA experiments. We compared the edit error rates of our algorithms with the edit error rates of previous algorithms. In all of the tests our algorithms presented significant improvements of the edit error rates.