M.Sc Thesis

M.Sc StudentMaoz Nadav
SubjectThe Coctail Party Effect for prior Knowledge Free, Event
Discovery in Concurrent, Heterogeneous, Time
Series Data
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Shie Mannor


In many real world scenarios, measurements are simultaneously collected from multiple sources, yielding a large set of heterogeneous time series. Recurring events can affect only a small subset of the series, as in localized phenomena in a (spatially distributed) sensor network. We propose a novel framework for discovering such patterns, complete with a Minimum Description Length based score function allowing hyperparameter optimization without requiring ground truth. Typical pattern mining methods search for patterns affecting all time series and can therefore miss patterns spanning over only a subset of the series. Taking a different approach, each time series is first processed separately. Subsequences are categorized, and each data point is labeled by its subsequence category. Continuously, labels for each time point are viewed as a set and a second learning step is performed on these sets to uncover overrepresented subsets, which are then transformed into patterns. We present COPPER, an algorithm based on this framework and on an analogy between data point labels and words, patterns and topics. In COPPER a naive method is used for the first step and a Bayesian topic modeling algorithm for the second. We evaluate our algorithm on real and simulated sensor data. Experiments performed on simulated sensor data show that our algorithm significantly outperforms an algorithm developed for the setting most similar to ours (Subdimensional Motif Discovery), as well as the merit of our proposed score function  for hyperparameter optimization. In experiments performed on real sensor data our method produces intelligible and meaningful results. Finally, we introduce a method for measuring the significance of patterns and partial patterns, and propose a method for controlling the rate of false positives.