טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentLivshits Ester
SubjectThe Complexity of Database Inconsistency Measures
DepartmentDepartment of Computer Science
Supervisor Professor Benny Kimelfeld
Full Thesis textFull thesis text - English Version


Abstract

Managing data inconsistency has been one of the major challenges in the research and practice of database management. Database inconsistency arises for different reasons and in different applications. Nowadays, many applications obtain information from imprecise sources (e.g., social networks) via imprecise procedures (e.g., natural-language processing). Inconsistency may also arise when integrating conflicting data from different sources. During the past two decades, researchers have established, developed and investigated a principled approach to managing database inconsistency via the notion of database repairs. A repair of an inconsistent database is traditionally defined as a consistent database that differs from the inconsistent one in a ``minimal'' way.


We investigate various problems arising in the challenge of measuring how inconsistent a database is. The problem of measuring inconsistency has been studied extensively by the Knowledge Representation and Logic communities, and has been recently acknowledged by the database community. Inconsistency measures are important for estimating the extent to which a database is trustworthy, and the effort required to clean it. Specifically, we explore the computational complexity of two basic inconsistency measures. The first measure is based on the cost of a minimal repair (i.e., the minimal number of operations required to obtain consistency), and the second is based on the number of repairs. We focus on data complexity (where the schema is considered fixed and the input consists of a database instance) and establish dichotomies in (i.e., a full classification of) data complexity for the entire space of sets of functional dependencies.


Finally, repairs are often not equally legitimate, as it is desired to prefer one over another; for example, one tuple is regarded more reliable than another, or a more recent tuple should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs that incorporates preferences among database tuples. We revisit the second measure in the presence of preferences among tuples in the database. We show that the presence of preferences significantly affects the computational complexity.