M.Sc Thesis

M.Sc StudentHurani Adham
SubjectSchema Matching using an SVM Classifier and User Feedback
DepartmentDepartment of Industrial Engineering and Management
Supervisor PROF. Avigdor Gal


Data integration becomes increasingly important. Schema matching is a data integration task, in charge of establishing correspondences between schema attributes. Matching schemas is often a stepped process where each step applies several algorithms, rules and constraints using schema matchers. A special class of schema matchers are called decision makers. Decision makers determine whether a corresponding pair is matching. Over the years, the challenges in schema matching have shifted. At first, the challenge was to find a single dominant schema matcher that performs best, regardless of the data model and application domain. Then, researchers aimed at predicting schema matcher performance. The recent open challenge is user involvement in the matching process with emphasis on designing a burdenless interaction scheme to reduce the validation effort.

This research focuses on the design of a decision maker, joint with an optimal user- feedback. We choose SVM, a machine learning algorithm as the tool of choice. We show mapping of SVM features to the schema matching world. The thesis also shows how the SVM as a decision maker can be extended to an iterative model that supports user involvement in a form of one-question-at-time. We provide several heuristics for choosing the question intended for the user feedback. In the end we show an empirical study of the tool and compare its performance with other leading tools.