טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentBarkay Hadas
SubjectPower of Some Novel Statistical Tests for Genome-Wide
Association Studies
DepartmentDepartment of Industrial Engineering and Management
Supervisor Professor Emeritus Paul Feigin
Full Thesis textFull thesis text - English Version


Abstract

Genome-Wide Association Studies (GWAS) have become in the recent decade an increasingly important tool in genetic research. The statistical analyses of such studies usually rely on the asymptotic distribution of test statistics, and for small samples, on conditional exact tests. A distinction is made between the analysis of qualitative (discrete) traits, where the data can be summarized as a set of contingency tables, and quantitative traits. We develop in the work presented here some novel tests for both types of traits. For qualitative traits, we present two novel groups of tests for genetic association studies, for small sample sizes: the Posterior Predictive P-value (PPP), constructed using a Bayesian approach; and the unconditional multinomial test (MLT), constructed using a frequentist approach. We provide specific tests for association between a binary trait and discrete genetic markers that result in different types of contingency tables, both for prospective studies and case-control studies. We study via simulation the power and type I error rates of case-control PPP tests and case-control MLT tests for single-SNP GWAS, based on different test statistics, including Pearson's chi-squared test and the Cochran-Armitage trend test for genetic tables. We check the sensitivity of our results to the priors used in the PPP method and to violations of the genetic population model caused by inbreeding. We study via simulation of GWAS the power and FDR of the tests, under various inheritance models. We show that the asymptotic and conditional exact tests are inferior to MLT and PPP tests for case-control GWAS. In order to implement our methods in practice, we suggest to adapt the computational method developed by Mehta and Patel of representing sets of contingency tables as networks, and show how to compute PPP and MLT exact p-values and Monte-Carlo approximations of these p-values. For quantitative traits, we present two novel asymptotic tests for the analysis of studies following an adaptive tail sampling design, where only subjects displaying trait values above or below some adaptive thresholds are genotyped. Finally, we use adaptive thresholds to transform the sampled quantitative data into a binary trait, and apply the PPP method to analyze simulated data. We show that considerable power can be gained from tail sampling, when the number of genotyped subjects is restricted to a small sample.