Ph.D Thesis

Ph.D StudentKolchinskyy Ilya
SubjectLazy Evaluation Methods for Complex Event Processing
DepartmentDepartment of Computer Science
Supervisor PROF. Assaf Schuster
Full Thesis textFull thesis text - English Version


Rapid advances in data-driven applications over recent years have intensified the need for efficient real-time detection of arbitrarily complex patterns in massive data streams. Complex event processing (CEP) is a prominent technology widely employed for performing this task in many areas, including online finance, security monitoring, credit card fraud detection, and IoT (Internet of Things) technologies. An increasingly active and rapidly developing area of academic research, CEP functionality is also provided by multiple open source libraries and commercial data analysis platforms.

CEP engines operate by collecting basic data items arriving from input data streams and using them to infer occurrences of complex events according to the user-defined patterns. To that end, data items are combined into higher-level entities matching the pattern structure. To guarantee detection correctness, a CEP system is required to actively maintain all subsets of data items that might eventually become a part of a successful pattern match. As a result, the overall number of such potential matches grows exponentially with the size and the sophistication of the pattern being detected. Considering that real-life patterns typically incorporate a highly convoluted structure and may consist of 10 or more events connected by increasingly complex operators, this situation makes large-scale complex event processing virtually infeasible even for businesses capable of acquiring extensive computation power. In addition, modern CEP applications are often required to process hundreds or even thousands of patterns and streams in parallel under tight real-time constraints, which increases the magnitude of the problem.

In this thesis, we present a novel method for overcoming the exponential resource requirements of complex event processing. Our solution is based on the principle of 'statistic-based lazy evaluation'. Under this paradigm, incoming data items are allowed to be processed in an order different from their natural order of appearance in an input

stream. As a result, statistical properties of the underlying data, such as the arrival frequencies of different types of items and the selectivities (probabilities of success) of the pattern-defined constraints, can be utilized to construct evaluation plans providing close-to-optimal detection performance.

We describe an efficient lazy evaluation mechanism for complex event processing based on nondeterministic finite automata. By exploiting the similarity of our problem to the well-known problem of join query plan generation, we develop a procedure for adapting the existing join-based algorithms to the CEP domain, thus creating a family of algorithms for generating practically effective pattern detection plans. As the statistical data properties required for plan generation are rarely known in advance and may change dynamically, we devise an efficient and precise adaptive method that continuously estimates the current statistic values of the required data characteristics

on-the-fly and, if and whenever necessary, modifies the evaluation plan accordingly. Finally, we extend our methods to multi-pattern CEP environment, showing how the lazy evaluation approach can facilitate common subexpression sharing between different patterns in a workload. An extensive theoretical and empirical analysis of our innovations demonstrates their superiority over state-of-the-art approaches.