M.Sc Thesis

M.Sc StudentZarivach Evelina
SubjectEvaluating Datalog Programs over Infinite and Founded
DepartmentDepartment of Computer Science
Supervisor ASSOCIATE PROF. Joseph Gil
Full Thesis textFull thesis text - English Version


Traditionally, infinite databases were studied as a data model for queries that may contain function symbols (since functions may be expressed as infinite relations). Recently, the interest in infinite databases has been sparked by additional scenarios, e.g., as a formal model of a database of an open-world software or of other relations that may be spread across the Web. In the course of implementing a database system for querying Java software, we found that the universe of Java code can be effectively modeled as an infinite database. This modeling makes it possible to distinguish between queries which are "open-ended," that is, whose result may grow as software components are added into the system, and queries which are "closed," in that their result does not change as the software base grows. Further, closed queries can be implemented much more efficiently than open queries.

This work revisits the weak safety and termination problems for recursive Datalog programs evaluated over infinite databases. In particular, an algorithm is presented that computes all finiteness constraints for the IDB predicates of a program, given a set of finiteness constraints over the EDB predicates.

In addition to being of interest in itself, this algorithm also presents an alternative method to check for weak safety and as a skeleton for query evaluation. A sufficient condition for program termination is also presented, provided that the program and database satisfy certain natural constraints.

These constraints are often satisfied in the context of software analysis problems. For programs that satisfy these constraints, we also provide an algorithm to generate an efficient evaluation scheme of closed queries, which is a generalization of  Vieille's famous QSQR algorithm for top-down evaluation of Datalog programs. A by-product of this work is a rather terse and elegant representation of QSQR.