M.Sc Thesis

M.Sc StudentYogev Sivan
SubjectEvaluation of Scoring Functions for Protein Multiple
Sequence Alignment using Structural Alignments
DepartmentDepartment of Computer Science
Supervisor PROFESSOR EMERITUS Shlomo Moran


The process of aligning a group of protein sequences to obtain a Multiple Sequence Alignment (MSA) of these sequences is a basic tool in current bioinformatic research. The development of new MSA algorithms raises the need for an efficient way to evaluate the quality of an alignment, in order to select the best alignment among the ones produced by the available algorithms. A natural solution to this problem is to find a scoring function, which assigns for each alignment a number reflecting its quality. Different scoring functions for MSA were proposed over the years, without comprehensive assessment of their quality. In this work we present an evaluation scheme for MSA scoring functions using structural alignments of interest.

Structural alignments are alignments obtained through analysis of the 3 dimensional structures of related proteins. A “structural score”' can be defined by measuring the amount of conserved regions preserved by an alignment in comparison to the structural alignment of the same sequences. We propose a framework for evaluating the quality of MSA scoring functions, based on testing the correlation between the result of applying a given scoring function and the structural score. The correlation is tested on a set of alignments of sequences for which the structural alignment is known.

An inherent problem that needs to be resolved is the identification of an appropriate sample set of alignments to be used in the correlation test. We describe this problem, suggest a solution and report results using this solution, based on correlation tests that use the alignments benchmark BAliBASE as source for structural alignments.

A few interesting conclusions arise from the application of the scheme on a number of scoring functions used in different widely used MSA tools. In general, the ranking of scoring functions is resistant to changes in alignment characteristics. In some cases the results for functions with different theoretical justification are very similar, and the use of gap penalties does not significantly improve the function quality.