טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentGreenspan Gideon Dov
SubjectA Bayesian Network Model of Haplotype Block Variation:
Inference and Application
DepartmentDepartment of Computer Science
Supervisor Professor Dan Geiger


Abstract

Recent studies of the human genome have uncovered a block-like pattern of SNP variation. Haplotype blocks are defined as chromosomal stretches in which a small number of multi-marker variants cover most of the observed variation. It is believed that haplotype blocks are generated by hotspots of recombination, which account for the vast majority of crossovers during meiosis.


We formulated a statistical model of haplotype block variation which takes account of recombinations, mutations and population genetic effects. Our model is based on a Bayesian Network with a Markov chain at its core. We developed two heuristic learning algorithms to infer instances of our model which are most suitable for observed haplotype, genotype or trio data.


Our model and learning algorithms were applied to three biological problems, with promising results. The first application is haplotype resolution, which infers pairs of haplotypes underlying a set of genotype observations. The second application is linkage disequilibrium (LD) mapping, which searches for a hidden genetic factor causing phenotypic variation. The third application is the inference of recombination structure from a set of raw genomic sequences.


We also addressed two key questions by examining high density data from the International Haplotype Mapping (HapMap) project. First, we confirmed the role of recombination hotspots in generating haplotype blocks, which has been the subject of much debate. Second, we showed that a Markov model over haplotype blocks is uniquely accurate for representing high density SNP variation.


Our statistical model and algorithms have been implemented as the HaploBlock software package.