טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentTzemach Anna
SubjectPreparing SNP Data For Genetic Linkage Analysis
DepartmentDepartment of Computer Science
Supervisor Professor Dan Geiger
Full Thesis textFull thesis text - English Version


Abstract

Single nucleotide polymorphisms (SNPs) are stably inherited, highly abundant, and distributed throughout the genome. Current estimates are that SNPs occur as frequently as every 100-300 bases. This implies that in an entire human genome there are approximately 10 to 30 million potential SNPs. More than 4 million SNPs have been identified and the information made public. Their large number makes SNPs good candidates for linkage analysis, but introduces new problems that were not significant for highly polymorphic markers. Part of the problems originate in the genotyping process, others are the result of restrictions of current linkage software. In the present thesis we propose an algorithm for preprocessing SNP data and implement a tool, the SNPdistiller, that handles the complete process of preparing SNPs for linkage analysis, from the data after genotyping to the creation of an input file suitable for currently available linkage analysis tools. The tool begins by removing erroneous and unlikely SNPs from the data and continues by organizing SNPs into clusters that simulate behavior of high polymorphic and informative markers. The algorithm takes into consideration both the genetic data and the capabilities of the linkage analysis software. Experimental results demonstrate the performance of SNPdistiller on simulated and real datasets. The thesis ends by proposing further enhancements to the algorithms implemented in SNPdistiller.