M.Sc Thesis


M.Sc StudentKosman Eitan
SubjectComplex Pattern Mining
DepartmentDepartment of Computer Science
Supervisors PROF. Assaf Schuster
DR. Ilya Kolchinsky


Abstract

Mining complex patterns from large data sets has attracted much attention in the last few decades, due to the explosion of the amounts of data available on the web.

In the simplest scenario, a pattern consists of a set of elementary items reveals a certain phenomenon that occur within a database.

The problem of generating these patterns involves detection of complex dependencies between the various data types within the database.

This field is of great importance in areas where analysis of data provides highly valuable insights regarding behavioral characteristics, market basket analysis for instance, with a potential for a very wide financial impact.

The availability of data, and even more importantly, the development of pattern mining methodologies, breaks the barrier of understanding and characterizing behaviors, which enables further improvement and optimization of services.


A plethora of methods and algorithms have been designed for mining a variety of patterns, ranging from simple association rules and frequent itemsets to advanced graph-based structures.

At the very beginning of the field, pattern mining algorithms were prone to high complexity.

Furthermore, algorithms were often limited to specific data types which prevented their adoption to general uses.

Indeed, as modern applications grow dramatically more sophisticated and operate on highly multidimensional and increasingly complex data, they introduce the demand for mining even more expressive and convoluted patterns unsupported by the current state-of-the-art techniques.

As a result, more powerful and expressive pattern mining approaches are needed.


To address this limitation, in this research I develop a new pattern model with substantially more expressive power than the currently used model which is able to summarize advanced insights from databases.

The new model generalizes the notion of a pattern by considering multidimensional inter-item relations of arbitrary form and complexity.

Furthermore, in order to support the mining procedure of patterns that meets the characteristics of the new model, I developed methods for efficiently mining this new notion of a pattern from large datasets.

These methods are validated on diverse real-world and synthetic datasets in an extensive experimental study, in which I show that the proposed methods are able to generate expressive, representative and comprehensible patterns.

Additionally, I present a thorough analysis of the proposed methods and show that they are able to generate meaningful patterns efficiently, both quantitatively and qualitatively.