טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentGoldin Inna
SubjectPower of Genetic Association Tests for Populations with
Sub-Structure
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Emeritus Paul Feigin
Full Thesis text - in Hebrew Full thesis text - Hebrew Version


Abstract

Population-based association studies provide an attractive approach to the identification of susceptibility genes underlying complex genetic traits. However case-control studies typically rely on the unrealistic assumption of population homogeneity.  The presence of population structure can result in “spurious association” between phenotype and genotype that leads to more false positives. Such spurious association can occur when the disease frequency varies across subpopulations.


The current work presents a simulation study of three statistical methods that deal with the problem of population substructure, and one that does not.

The method that does not consider population stratification is the Cochran-Armitage test for trend. This method focuses chi-squared tests on a particular set of narrow alternatives, and is especially sensitive to the anticipated trend.
The three methods that allow for population stratification are:

·                Genomic Control- which estimates the over-dispersion factor of the Armitage test statistic from a set of null markers;

·                Structure- which consists of a  Bayesian approach, using unlinked genotypes to infer population substructure, implemented via Markov Chain Monte Carlo; and

·                L-POP- which detects population stratification by utilizing a Latent Class Analysis model.

A simulation study was carried out in order to assess each method's success in protecting from the effects of cryptic substructure. The EASYPOP simulation program was chosen to simulate datasets of polymorphisms throughout the genome.  Six demographic scenarios were simulated in order to assess the performance of the various methods. Demographic scenarios vary according to relative population sizes, between population variance in allele proportions, and disease prevalence. An island model with two subpopulations and migration between them was used.

The simulation study compared the performances of various methods in terms of the number of false positives and the power of detecting the gene associated with the disease.

The following conclusions were reached following the study: under typical values of FST and under the conditions of the simulated scenarios, all the methods performed well, limiting the number of false positives. The only method that caused to loss of power, in some scenarios, was Genomic Control. On the other hand, the Genomic Control method has the significant advantage of being independent of the number of subpopulations. The main suggestion of this study, under the investigated conditions, is to use Genomic Control to detect the existence of population substructure, and to use one of two methods, Structure or L-POP to overcome the effect of the cryptic structure.