Ph.D Thesis

Ph.D StudentSwidan Firas
SubjectComparative Genomics: from Accurate Mapping to
Sorting by Weighted Reversals and Repeat-Annotated
DepartmentDepartment of Computer Science
Supervisor PROF. Ron Pinter


This work focuses on a family of algorithmic and combinatorial problems in comparative genomics and the biological implications of their solutions.  They range from an integrative accurate mapping method through efficient algorithms for handling both repeat-constrained as well as length-weighted genome rearrangement models to a novel phylogenetic inference technique. Our algorithmic contributions and the biological insights gained from their applications are as follows:

1.      We present a novel approach for comparative mapping resulting in a highly accurate parsing of the genome into rearrangement-free segments.  By applying our method to pairs of prokaryotic genomes, we are able to compare, for the first time, the different forces shaping these genomes. In particular, we show that horizontal gene transfer and large deletions affect genome evolution significantly more than duplications.  Furthermore, we show - unexpectedly - a good fit between the Nadeau-Taylor model's predictions and observed data in most cases.

2.      Recent biological findings indicate a strong association between reversals and repeats.  We consider the implications of these findings on the essential problem of phylogenetic inference.  The inference problem has been addressed by a wide class of methods, all of which have been substantially criticized.  In contrast, under our new model, we show that the evolutionary process - including tree-topology and inner-node assignments - is uniquely determined.  This surprising result justifies viewing the repeats as “footprints” left by evolution, providing accurate evidence of the path it chose.  Furthermore, we provide non-trivial linear-time algorithms for reconstructing the evolutionary process, which exploit - among other things - a new data structure extending word-tries to the case of special collections of (unordered) sets.

3.      Finally, we study the problem of sorting all 8 alternatives of  {singed,unsigned}*{linear, circular }*{0/1 sequence,   permutations} by length-weighted reversals.  We consider a wide class of cost functions, namely f(l)=la for  a >0, where l is the length of the reversed  subsequence.  We present surprising and non-trivial polynomial-time algorithms for the problems of sorting unsigned (linear and circular) 0/1 sequences when a=1, as well as approximation algorithms for sorting signed (linear and circular) sequences. Based on these results we improve the approximation guarantee for all variants of the problem of sorting permutations when a=1.