Ph.D Thesis

Ph.D StudentSagi Tomer
SubjectSchema Matching Evaluation
DepartmentDepartment of Industrial Engineering and Management
Supervisor PROF. Avigdor Gal
Full Thesis textFull thesis text - English Version


Solving matching problems entails generating alignments between structured data. Examples are schema matching, process-model matching, ontology alignment, and Web-service composition. Design of software aimed at solving these problems is aided by solution quality evaluation-measures. We base our exploration in the schema matching domain, and present a couple of applications of this work to other domains as well. Historically, evaluation-measures have been based upon binary set-theory, required an expert generated exact-match and assumed a single expert review following the algorithmic effort. Motivated by new applications, this dissertation extends existing measures and proposes new measures supporting evaluation in various scenarios. We begin by generalizing the similarity-matrix abstraction to a vector-space on which evaluation-measures are defined. We continue to define a categorization of evaluation-measures from which we explore three categories.

We first explore evaluation without an exact-match. To this effect, we introduce schema matching prediction. We present a comprehensive framework in which predictors can be defined, designed, and evaluated. We formally define schema matching prediction using similarity spaces and discuss a set of four desirable properties of predictors, namely correlation, robustness, tunability, and generalization. We present a method for constructing predictors and supporting generalization, and introduce prediction models as means of tuning prediction towards various quality measures. We illustrate the usefulness of schema matching prediction by presenting three use cases pertinent to schema matching research and application. An extensive empirical evaluation shows the usefulness of predictors in these use-cases.

Our second exploration extends commonly used set-based measures in the presence of an exact match to support non-binary results as well. Non-binary evaluation is formally defined together with several new, non-binary evaluation-measures. We provide an empirical evaluation to support the usefulness of non-binary evaluation and show its superiority to its binary counterparts in several problem domains.

Our final exploration examines the evaluation of multiple human and

algorithmic results collaborating on the same task. matching problems have been historically defined as a semi-automated task in which correspondences are generated by matching algorithms and subsequently validated by a single human expert. Emerging pay-as-you-go models are based upon piecemeal human validation of algorithmic results, often using crowd based validation. An alternative, more symmetric model, is presented. We examine the unique aspects of human matchers and those common to human and algorithmic matchers using extensive user studies. Results are discussed in light of their implications on models for human and algorithmic collaboration in schema matching.