M.Sc Student | Ateret Anaby-Tavor |
---|---|

Subject | Enhancing the Formal Similarity Based Matching Model |

Department | Department of Industrial Engineering and Management |

Supervisor | Full Professor Gal Avigdor |

The ambiguous interpretation of
concepts, describing the meaning of data in heterogeneous data sources is
commonly known as *semantic heterogeneity*.

*Semantic heterogeneity* is
a well-known obstacle to data source integration. Semantic heterogeneity is
resolved through a process of *semantic reconciliation*, which matches
concepts from heterogeneous data sources. The shift towards machine
understandable Web resources has unearthed the importance of automatic semantic
reconciliation. However, automatic matching carries with it a degree of
uncertainty, as it is based on syntactic, rather than semantic means. Moreover,
it is commonly acknowledged that no automatic semantic reconciliation algorithm
can be assumed to identify the exact mapping (as analyzed by human) in all
scenarios.

This work addresses the issue of the inherent uncertainty in the automatic matching process outcome, a key problem in the schema integration process.

The
thesis is twofold: We first present the monotonicity principle, which is a
desired attribution of a matching algorithm. We demonstrate through theoretical
and empirical analysis that for a certain family of ''well behaved'' mappings
(we term *monotonic*), one can safely interpret a high similarity measure
as a good semantic mapping. We assert that this formal model offers a solid
foundation for analyzing the quality of existing matching algorithms (a joint
work with A. Trombetta, Università dell’Insubria, Italy). Next, we propose three efficient algorithms for identifying the exact mapping among
the K mappings with highest rank (a joint work with A. Moss, Netanya Academic College). Given two schemata with n_{1} and n_{2} concepts,
the first algorithm identifies the second best mapping in a time complexity of
O(n) C(A_{best}), where n=max(n_{1},n_{2}), A_{best}
is the best (complexity-wise) algorithm for finding a maximum weight mapping in
a bipartite graph, and C(A_{best}) denotes the complexity of the
algorithm A_{best}. The second algorithm identifies the second best
mapping in a time complexity of O(n^{3}), regardless of A_{best}.
We then show that these algorithms do not scale for a general K. Therefore, we
suggest a polynomial algorithm (O(Kn) C(A_{best})) for identifying
top-K mappings.

We assert that an algorithm for mapping schemata that obeys the monotonicity principle would rank the exact mapping close to the best one (as generated by a matching algorithm). Thus, a system (such as a software agent) that is provided with a monotonic algorithm and equipped with our top-K search algorithms and a user query, can rewrite K queries, each with a slightly different mapping in its quest for the exact mapping.