טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentBercovici Sivan
SubjectMapping by Admixture Linkage Disequilibrium: New Criteria
and Algorithms
DepartmentDepartment of Computer Science
Supervisor Professor Dan Geiger
Full Thesis textFull thesis text - English Version


Abstract

Much effort has recently been invested in developing methods for determining the ancestral origin of chromosomal segments in admixed individuals. This task is motivated by the study of population history such as bottleneck effects and migration, the assessment of population stratification for adequate adjustment of association studies, and the enhancement of mapping by admixture linkage disequilibrium (MALD). MALD, also known as Admixture Mapping, offers a statistically powerful and economical gene mapping approach used for the identification of genomic regions harboring disease susceptibility genes in the case of recently admixed populations. The method is applicable when the prevalence of a disease is significantly different between the ancestral populations from which the admixed population was formed. When such a disease is studied, admixed individuals carrying the hereditary disease are expected to show an elevated genomic contribution from the ancestral population with the higher prevalence of the disease around the disease gene loci. The MALD method successfully discovered multiple risk alleles for prostate cancer , a disease with a higher incident rate in Africans compared to Europeans, and a candidate locus for end-stage kidney disease in African Americans .

A MALD study is comprised of three main steps. First, a panel of ancestry informative markers (AIM) that differentiate well between ancestral populations is designed. Next, either cases or both cases and controls are individually genotyped using the AIM panel, and the mosaic of ancestries of each individual is inferred. Finally, the inferred ancestral profiles are scanned in search for an aberration towards the ancestral population with the higher risk, as expected to appear near the disease locus. In our work we address all three steps, offering enhancements and extensions that increase the performance or reduce the cost of such studies. We are motivated by the fact that success in improving these steps is of great medical significance, potentially yielding preventive and therapeutic options for human hereditary diseases.

For the selection of a panel of ancestry informative markers we develop an informationtheory based measure, called EMI (expected mutual information), that computes the impact of a set of biological markers on the ability to infer ancestry at each chromosomal location . While recent marker selection methods focus mainly on the selection of the most informative markers, our measure offers a well balanced selection of markers that takes the genome coverage into consideration. Namely, our method overcomes previous methods’ deficiencies of over-crowded marker regions in information-saturated locations coupled with information "blind spots". We present an effective algorithm for panel selection that strives to maximize the EMI score. Using these tools, we construct panels for the African-American admixed population.

For the ancestry inference stage, we develop a framework that incorporates complex probability models that account for linkage-disequilibrium in the ancestral populations . Contrary to other state-of-the-art methods, this model allows the usage of dense marker panels, increasing the accuracy and directly affecting the performance of MALD. In comparison to previous ancestry inference methods, our method is the first to describe how arbitrary LD models can be used for the task of ancestry inference.

Finally, we develop a novel approach that considerably reduces the genotyping cost of disease studies by applying admixture aberration analysis (AAA) on pooled DNA of affected admixed individuals . Our analysis detects divergence of allele distribution in a pool of samples near a disease locus without the intermediate step of ancestry inference per individual. The inherent aberration in admixture around the disease locus shifts the sampled allele frequencies towards the distribution of the alleles in the ancestry with the higher risk. It is the examination of this shift, evaluated through the estimation of allele frequencies in the pooled sample, that provides the means for our pooled mapping method. Similarly to our work on ancestry inference, AAA accounts for LD in the ancestral populations to allow the use of dense marker panels, further increasing the method’s power.