M.Sc Thesis

M.Sc StudentLekhtman Alon
SubjectEfficient Probabilistic Top-K Query Answering Utilizing
Database Indices
DepartmentDepartment of Industrial Engineering and Management
Supervisor PROF. Avigdor Gal
Full Thesis textFull thesis text - English Version


The research into uncertain schema matching gives rise to probabilistic top-k query answering. In this work we propose the use of database indices to improve the performance of such queries. We introduce two new semantics, namely mutual-exclusive and independent, which are inspired by concepts from the probabilistic databases field. Mutual-exclusive semantics indicates that only one, and exactly one mapping is the correct one. Independent semantics indicates that there may be multiple correct mappings, or no correct mappings at all. Together with the by-table and by-tuple semantics, which were introduced in previous works, four different semantics are created. In addition, four different probabilistic top-k query types are introduced. We propose a set of algorithms to support all the introduced query types under all the semantics types and show comparison between our algorithm and an algorithm that was presented in previous work on a specific example. We provide a probabilistic analysis, bounding the number of tuples that need to be accessed before returning a result to a specific query. Moreover, we analyze a simulated dataset and show the results are very similar to the calculated bounded number of tuples. We test the performance of the algorithms using a benchmark dataset and show that the use of indices provide a significant improvement over state-of-the-art methods.