M.Sc Thesis

M.Sc StudentDiament Alon
SubjectThree Dimensional Genomic Organization of Eukaryotic Genes
is Strongly Correlated with Their Codon Usage;
Expression and Function
DepartmentDepartment of Electrical and Computer Engineering
Supervisors PROFESSOR EMERITUS Ron Pinter
Full Thesis textFull thesis text - English Version


One of the most fundamental biological questions is what determines the eukaryotic genomic organization. While it has been shown that the distribution of genes in eukaryotic genomes is not random, the previously reported relations between gene function and genomic organization were relatively weak.

Here we apply a novel tool for assessing functional relatedness, codon usage bias similarity (CUBS), which measures similarity between genes in terms of codon and amino acid usage. We demonstrate that CUBS is an unbiased, continuous measure, which can be computed for any gene pair based on their coding sequence and utilized as a proxy for functional similarity. Our proposed metric is highly correlated with various experimental features related to gene expression such as mRNA level similarity and protein abundance similarity. In addition, it is highly correlated with  gene annotations related to their functionality such as protein-protein interaction networks and Gene Ontology terminology.

By analyzing computational models based on Chromosome Conformation Capture (3C) data, describing the three dimensional conformation of the DNA, we show that the functional similarity between genes captured by CUBS is directly and very strongly correlated with their three dimensional (3D) distance. We considered 5 eukaryotes, analyzed in a unified method for the first time: S. cerevisiae (r=0.85; p<10-323), S. pombe (r=0.74; p<10-323), A. thaliana (r=0.75, p<10-323), mouse (r=0.96; p<10-323) and human (r=0.87; p<10-323). The result remains very significant even when controlling for the linear (one dimensional) organization of genes; in addition, we show that the reported result cannot be explained by other possible variables such as GC content of the genes, their length, experimental biases or by regional nucleotide properties that are not related to the coding sequence. To this end, we employed a set of statistical tests tailored for the unique nature of the analyzed experimental data. These results indicate that the importance of three-dimensional genomic localization in eukaryotes is significantly higher than previously thought and that, despite the fact that transcription and translation occur in different cellular compartments, codon usage is tightly linked to genome architecture.