|M.Sc Student||Goldin Inna|
|Subject||Power of Genetic Association Tests for Populations with|
|Department||Department of Industrial Engineering and Management||Supervisor||Professor Emeritus Paul Feigin|
|Full Thesis text - in Hebrew|
Population-based association studies provide an attractive approach to the identification of susceptibility genes underlying complex genetic traits. However case-control studies typically rely on the unrealistic assumption of population homogeneity. The presence of population structure can result in “spurious association” between phenotype and genotype that leads to more false positives. Such spurious association can occur when the disease frequency varies across subpopulations.
The current work presents a simulation study of three statistical methods that deal with the problem of population substructure, and one that does not.
The method that does not
consider population stratification is the Cochran-Armitage test for trend. This
method focuses chi-squared tests on a particular set of narrow alternatives,
and is especially sensitive to the anticipated trend.
The three methods that allow for population stratification are:
· Genomic Control- which estimates the over-dispersion factor of the Armitage test statistic from a set of null markers;
· Structure- which consists of a Bayesian approach, using unlinked genotypes to infer population substructure, implemented via Markov Chain Monte Carlo; and
· L-POP- which detects population stratification by utilizing a Latent Class Analysis model.
A simulation study was carried out in order to assess each method's success in protecting from the effects of cryptic substructure. The EASYPOP simulation program was chosen to simulate datasets of polymorphisms throughout the genome. Six demographic scenarios were simulated in order to assess the performance of the various methods. Demographic scenarios vary according to relative population sizes, between population variance in allele proportions, and disease prevalence. An island model with two subpopulations and migration between them was used.
The simulation study compared the performances of various methods in terms of the number of false positives and the power of detecting the gene associated with the disease.
The following conclusions were reached following the study: under typical values of FST and under the conditions of the simulated scenarios, all the methods performed well, limiting the number of false positives. The only method that caused to loss of power, in some scenarios, was Genomic Control. On the other hand, the Genomic Control method has the significant advantage of being independent of the number of subpopulations. The main suggestion of this study, under the investigated conditions, is to use Genomic Control to detect the existence of population substructure, and to use one of two methods, Structure or L-POP to overcome the effect of the cryptic structure.