M.Sc Thesis

M.Sc StudentAlperovich Dalia
SubjectEfficient Search for Optimally Enriched Combination of
Ranked Lists
DepartmentDepartment of Computer Science
Supervisors ASSOCIATE PROF. Zohar Yakhini
PROF. Yael Mandel-Gutfreud
Full Thesis textFull thesis text - English Version


In this thesis, we describe the results of three research projects, all related to the analysis of molecular measurement data, mostly transcriptomic data. The first and central one addresses the aggregation of ranked lists and the analysis of associations between multiple ranked lists. In the second project, we developed a software tool to perform flexible survival analysis based on quantitative data such as gene expression. In the third project, we describe bioinformatics tools used in supporting a collaborative cancer research study.

It is often the case in biological measurement data that results are given as a ranked list of quantities - for example differential expression (DE) of genes as inferred from microarrays or RNA-seq in a cohort of samples.

Recent years brought considerable progress in statistical tools for enrichment analysis in ranked lists. Several tools are now available that allow users to break the “fixed set” paradigm in assessing statistical enrichment of sets of genes. In the case of gene expression, these tools can identify factors that may be associated with measured differential expression.

A drawback of existing tools, however, is their focus on identifying single factors associated with the observed or measured ranks, failing to address relationships between the factors. For example - it may be the case that genes targeted by multiple miRNAs play a central role in the DE signal, but the effect of each single miRNA is too subtle to be detected. We describe several such examples in cancer expression datasets.

We propose statistical and algorithmic approaches for selecting a sub-collection of factors that can be aggregated into one ranked list that is heuristically most associated with an input ranked list (pivot, the DE ranking in the above example). We examine the performance on simulated data and apply our approach to cancer datasets. We find small sub-collections of miRNA that are statistically associated with gene DE in several types of cancer in human, suggesting miRNA cooperativity in driving disease related processes. Many of our findings are consistent with known roles of miRNAs in cancer, while others suggest previously unknown roles for certain miRNAs. We also present an application to other molecular Biology related problems, such as the relation between Long non-coding (Linc) RNAs and chromatin modifiers in mouse, and the relation between differential expression and chromatin modification sites in yeast. 

In two other projects described in this thesis, we propose and present an implementation of an algorithm for flexible threshold survival analysis based on differential expression and a method for using RNA-seq data to identify genes that significantly change during differentiation in stroma and cancer cells.