Ph.D Thesis

Ph.D StudentBen-Zaken Zilberstei Chaya
SubjectSpotting Regulatory Elements by Micro-Arrays
DepartmentDepartment of Computer Science
Supervisor ASSOCIATE PROF. Zohar Yakhini


Comparing the human and mouse genomes proves that the vast majority of genes are almost identical between the two organisms. Much of the differences between human and mouse most likely stem from variations in the gene regulatory networks, that determine the expression levels of genes.

To date, despite the tremendous success of genome sequencing efforts and the numerous complete genome sequences available, their regulatory networks remain mostly unknown.  The first step towards elucidating the regulatory networks, is to identify the individual regulatory elements composing them. In this thesis we developed two programs, aimed to decipher some of these individual elements.

The program RIMPairFinder, is a data mining tool for discovering phrases

composed of pairs of cis-regulatory motifs and logical operations between them. These phrases reflect the joint activity of corresponding regulatory factors that control gene expression at a cell condition of interest. The first factor is a transcription factor protein, binding to the promoters. The second is a protein influencing mRNA stability, binding to the 3'UTRs.

At the core of  RIMPairFinder is an efficient algorithm that allows the statistical evaluation of a large number of pair-phrases. The efficiency of that algorithm relies on a branch and bound method, based on pruning the search space of phrases, without any loss of output optimality.

MicroRNAs (miRNAs) are short RNA sequences that can bind to target mRNAs and change their expression levels by redirecting their stabilities and marking them for cleavage.  MiRNAs  are believed to control expression under various conditions, such as stress and stimuli, as well as in specific tissue types. The second program we developed, MiRNAXpress, is a high throughput approach for associating between microRNAs and conditions in which they act, using novel statistical and algorithmic techniques. MiRNAXpress  comprises two main modules that work in tandem to compute the desired output.  The first is an efficient target prediction engine that predicts mRNA targets of query microRNAs. The second module, statistically associates between microRNAs and their potential conditions of activities.

The first module is the computationally intensive part of the work done by miRNAXpress, and therefore an efficient algorithm for this portion facilitates the entire process.  Thus, the target prediction engine is based on an efficient approximate hybridization search algorithm whose efficiency is the result of utilizing the sparsity of the search space without sacrificing the optimality of the results.