|M.Sc Thesis||Department of Computer Science|
|Supervisor:||Prof. Pinter Ron|
Genome-scale metabolic networks are now being reconstructed for a variety of organisms. As this kind of data accumulates, the need for analysis tools increases. A notable tool is a pathway alignment finder that enables the detection of conserved metabolic pathways among different species or divergent metabolic pathways within a specie. When comparing two pathways, it should be powerful enough to take into account both the pathway network topology as well as the nodes' labels (e.g. the proteins they denote), and allow flexibility by matching similar - rather than identitical - pathways. To this end, it is important to have a similarity score that allows us to compare and align pathways in the same way we do with nucleotide and protein sequences.
In this thesis we suggest a formalization that captures the intuitive notion of pathway similarity in terms of Approximate Labeled Subgraph Homeomorphism (ALSH), as follows: Given two directed labeled graphs P and T and a predefined label-to-label similarity score matrix, compute the homeomorphic subgraph of T which is structurally most similar to P and which also maximizes the overall node-to-node resemblance. We present solutions to the ALSH problem where P and T are trees (both rooted and unrooted, both ordered and unordered). Their complexity is polynomial, where the degree depends on the type of trees allowed.
We also present a prototype tool that, given a query pathway and a database of pathways, will find and report all approximate occurrences of the query in the database, ranked by similarity score. The program also supports a visualization tool with which the alignment of two similar pathways can be graphically displayed.