M.Sc Thesis

M.Sc StudentNovich Gal
SubjectTree-Test: an association test for observation on a
directed tree
DepartmentDepartment of Computer Science
Supervisors PROF. Roy Kishony
Full Thesis textFull thesis text - English Version


In classical statistics, when evaluating an association between two variables, independent observations are collected and statistical tests such as Fisher Exact are commonly used. However, for many real-world applications, the assumption of sample independence is quite erroneous. For example, in the field of population genetics - it is known that any collected observations are not independent, as they all share a common ancestor.

To answer the possible confounders that arise from sample evolutional dependence -

geneticists have developed tree-based statistics. These tools aim to account for a hierarchical dependency structure of the samples dictated by the topological structure of their ancestry - a “family” tree. The current state of the art association tests use Monte-Carlo simulations to account for these dependency structures. However, the computational power needed to apply them is not negligible, making them unscalable for big-data analysis.

In our work, we introduce a generalized, simulation-free, analytic test that accounts for hierarchical sample dependency structures. We formulate our model assumptions, and compare our performance to the existing state of the art. Our method is widely applicable, as hierarchical sample dependency structures exist in many types of real-world data. To showcase the strength and generality of our method, we present an analysis of big-data observational case studies of social media data sourced from YouTube and Wikipedia.