|M.Sc Student||Novich Gal|
|Subject||Tree-Test: an association test for observation on a|
|Department||Department of Computer Science||Supervisors||PROF. Roy Kishony|
|ASSOCIATE PROF. Zohar Yakhini|
|Full Thesis text|
In classical statistics, when evaluating an association between two variables, independent observations are collected and statistical tests such as Fisher Exact are commonly used. However, for many real-world applications, the assumption of sample independence is quite erroneous. For example, in the field of population genetics - it is known that any collected observations are not independent, as they all share a common ancestor.
To answer the possible confounders that arise from sample evolutional dependence -
geneticists have developed tree-based statistics. These tools aim to account for a hierarchical dependency structure of the samples dictated by the topological structure of their ancestry - a “family” tree. The current state of the art association tests use Monte-Carlo simulations to account for these dependency structures. However, the computational power needed to apply them is not negligible, making them unscalable for big-data analysis.
In our work, we introduce a generalized, simulation-free, analytic test that accounts for hierarchical sample dependency structures. We formulate our model assumptions, and compare our performance to the existing state of the art. Our method is widely applicable, as hierarchical sample dependency structures exist in many types of real-world data. To showcase the strength and generality of our method, we present an analysis of big-data observational case studies of social media data sourced from YouTube and Wikipedia.