טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentZeevi Shai
SubjectAnswering (Unions of) Join Queries using Random Access
and Random-Order Enumeration
DepartmentDepartment of Computer Science
Supervisor Professor Benny Kimelfeld
Full Thesis textFull thesis text - English Version


Abstract

As data analytics becomes more crucial to digital systems, so grows the importance of

characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports polylogarithmic-delay enumeration.

In this thesis, we seek a structure that supports the more demanding task of a “random

permutation”: polylogarithmic-delay enumeration in truly random order. Enumeration

of this kind is required if downstream applications assume that intermediate results are representative of the whole result set in a statistically valuable manner. As we show in the thesis, an even more demanding task is that of a “random access”: polylogarithmictime retrieval of an answer whose position is given.

We establish that the free-connex acyclic CQs are tractable in all three senses:

enumeration, random-order enumeration, and random access; and in the absence of

self-joins, it follows from past results that every other CQ is intractable by each of

the three (under fine-grained complexity assumptions). However, the three yardsticks

are separated in the case of a union of CQs (UCQ): while a union of free-connex

acyclic CQs has a tractable enumeration, it may (provably) admit no random access.

For such UCQs we devise a random-order enumeration whose delay is logarithmic in

expectation. We also identify a subclass of UCQs, for which we can provide random

access with polylogarithmic access time. Finally, we present an implementation and

an empirical study that show a considerable practical superiority of our random-order

enumeration approach over state-of-the-art alternatives .