טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentLena Granovsky
SubjectHierachical Clustering Methods for Biological Patterns
DepartmentDepartment of Biomedical Engineering
Supervisor Professor Emeritus Gath Isak


Abstract

Polymorphic (variable) markers that differ between individuals and species can be found throughout the non-coding regions of the mitochondrial DNA (mtDNA), making it useful for forensic examinations and phylogenetic studies. mtDNA has a unique inheritance pattern, in which one inherits his mtDNA from one's mother only.

In the present study the comparative performance of several hierarchical clustering algorithms is evaluated, when applied to mtDNA sequences. Both commonly accepted evolutionary trees and synthetically generated family trees (pedigrees) are employed as benchmarks.


A new metric for calculating dissimilarities between mtDNA sequences is proposed. This metric can be used for calculating distances between individuals or species, based on DNA mutations.


A new method is described for employing pedigrees as a reference for the comparison of the hierarchical clustering algorithms.  The idea is to represent a pedigree by a dissimilarity matrix, based on the generations' differences between its pedigree members.


Indices for hierarchical cluster validity, based on matrix comparison tests, are employed for the evaluation of the relative performance of the algorithms. 


The results of the study suggest that the UPGMA method appears to be the method of choice when dealing with classification of pedigrees and phylogenetic trees. Ward's algorithm and the complete link method proved to perform well in the case of the phylogenetic trees, and demonstrated a poor performance in the case of pedigrees. On the other hand, the single link method produced a good classification in the case of pedigrees, and the worst classification in the case of evolutionary trees.