|M.Sc Thesis||Department of Computer Science|
|Supervisor:||Prof. Shmueli Oded|
As XML becomes an increasingly popular format of representing information, it becomes more important to verify the integrity of this information. XML Schema is a W3C standard of defining constraints over XML documents. A schema can define the structure of documents (i.e., which elements and attributes appear in the document, and the parent-child relationships between them), and the data types of attributes and of simple elements. One of the useful features of XML Schema is the ability to define identity constraints of three kinds: 'key', 'keyref', and 'unique'. In this work we focus on the 'key' and 'keyref' constraints. These constraints are similar to the key and foreign key constraints used in relational databases. However, the hierarchical nature of XML documents complicates the semantics of these constraints.
In our experience, current XML Schema validators provide only the ability to validate a whole document and not the ability to change a document and check efficiently whether the change violates identity constraints. We define several simple operations that change a document (change a value of a node or several nodes, add or delete a sub-tree), and present efficient algorithms for incrementally validating these operations, i.e., algorithms that check whether changes to the document cause violations to identity constraints.
We present an implementation of these algorithms. This implementation is written is Python. It is based on the open-source validator XSV, and extends it with incremental validation capabilities. We perform experiments with our implementation, and show its superiority to validation of identity constraints from scratch using XSV.
Another part of this work involves XPath, a language for navigating in XML documents. XPath allows navigation according to axes that navigate from a node to its parent, children or siblings. We suggest adding foreign-key navigation axes. Given a schema that defines a keyref KR, we suggest adding two axes, one that allows navigation from a node to a node it references (i.e., its 'parent' according to the keyref), and another that allows navigation from a node to nodes that reference it (i.e., its 'children' according to the keyref). We present a formal definition of such axes. Then, we present an algorithm for evaluating an XPath expression that uses such axes. We define a fragment of XPath that is enriched with such axes, and investigate its expressive power. We also discuss the special case in which the keyref definition contains only one field.