טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentItzhak Fadida
SubjectDevelopment and Implementation of Algorithms for Full
Disjunctions
DepartmentDepartment of Industrial Engineering and Management
Supervisor Full Professor Cohen Sara
Full Thesis textFull thesis text - English Version


Abstract

Nowadays, users are presented with a wealth of sources of information on almost every topic of interest. One of the major challenges in effectively querying a collection of data sources is combining the data together in a coherent fashion without losing information.


The full-disjunction operation is a variation of the join operator that maximally combines tuples from connected relations, while preserving all information in the relations. Full disjunctions can be very useful for integrating heterogeneous data sources (e.g., web sources) since such data sources are often incomplete.


Under certain conditions full disjunctions can be expressed using a series of natural outer joins.

However, for the general case, full disjunctions cannot be expressed using standard relational operators, and thus, special algorithms are needed. Recently, efficient algorithms for computing a full disjunction have been proposed. However, these algorithms have not been implemented within a relational database. Indeed, such an implementation is nontrivial since it must take into consideration the physical constraints of the database in order to ensure efficiency.


We have concentrated on the algorithm IncrementalFD (IFD) from which was previously introduced in a theoretical form. The research includes the theoretical aspects required to implement the IFD algorithm and also includes an implementation on a well-known database called PostgreSQL. Aspects such as block-based execution, I/O costs, storage space, memory, indexing and others are considered to produce several efficient algorithms. Finally, extensive experimentation was performed to test the behavior and performance of the various algorithm alternatives.