|M.Sc Student||Abboud Amir|
|Subject||Monitoring General Functions in Distributed Systems with|
|Department||Department of Computer Science||Supervisors||Professor Assaf Schuster|
|Professor Daniel Keren|
|Full Thesis text|
Many monitoring tasks over distributed data streams can be formulated as a continuous query using a function that is defined over the global average of data vectors derived from the streams. The query will typically produce an alert when the value of the function crosses a predefined threshold. A fundamental problem in efficient scalable implementation of such threshold queries is that the data streams are distributed, sometimes over a wide geographical region. Moving all the data to a centralized data center for query processing may incur infeasible communication overheads and inflated data center resource costs. In some cases it may be prohibited altogether by the sheer aggregated size of the data, or by privacy laws. The goal is thus to enhance scalability by processing the query locally, using as little communication and global coordination as possible. We present a novel scheme for communication reduction in distributed monitoring using local constraints. Communication and global coordination are required only in the event that the local constraints are violated by the incoming data. Our work improves on previous work in a few critical aspects. First, whereas previous work required constructing a “distributed cover” of the entire convex hull of the local data vectors, our work compiles constraints that are designed to cover only the global average; further, they are directly matched and tailored to fit the local data distribution at each stream. The result is a dramatic decrease in the required volume of communication compared to previous state of the art, up to two orders of magnitude in our experiments with real-life data. Both the experiments and theoretical study suggest that the improvement factor increases with the dimension of the data. Also, in contrast to previous work, which necessitated complicated constraints and required enormous computational effort over each of the streams, our scheme can use very simple constraints which incur negligible local overhead. This latter advantage makes our new approach applicable to thin, battery-operated sensors and cellular devices.