|M.Sc Student||Wexler Ydo|
|Subject||Finding Approximate Tandem Repeats in Genomic Sequences|
|Department||Department of Computer Science||Supervisor||Professor Dan Geiger|
Duplications occur in sequences controlled by molecular mechanisms such as DNA. The duplicated patterns, which often undergo minor modifications, are called approximate tandem repeats. The origin of these repeats, as well as their biological function, is not fully understood. Nevertheless, they are believed to play an important role in genome organization and evolution and are known to be the cause of several diseases. Replication slippage, unequal crossing-over and evolutionary pressures cause a high degree of polymorphism in the number of repeats. Approximate tandem repeats are therefore useful markers for genetic studies, e.g. for DNA fingerprinting and mapping disease genes. Several studies have also shown that tandem repeat polymorphism plays an important role in the adaptation of pathogenic bacteria to their host and may also have pharmacological effects in humans. Several algorithms for detecting approximate tandem repeats were suggested in recent years.
Here, we suggest several definitions for approximate tandem repeat via the concept of sequence alignment and present an efficient algorithm for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats beyond those suggested. The ideas and methods underlying the algorithm are described and examined and its effectiveness on real-world genomic data and synthetic DNA sequences is demonstrated, including a comparison with previously suggested algorithms. Further approximation algorithms to the relevant distributions of the statistical model are introduced and their proximity to the exact distributions is shown. The algorithm has been implemented and is available at http://bioinfo.cs.technion.ac.il/ATRHunter.