M.Sc Thesis

M.Sc StudentBorzin Artyom
SubjectVisual Event Representation and Behavior Analysis Using
Generalized Stochastic Petri Nets
DepartmentDepartment of Electrical and Computers Engineering
Supervisor PROF. Ehud Rivlin
Full Thesis textFull thesis text - English Version


Recent research interest in video event representation and event recognition is mainly driven by the large spectrum of potential applications, such as surveillance, content based video indexing and others that may arise from the recognition of human activities. The surveillance applications become common these days as we see surveillance cameras in stores, airports, parking lots, banks and other public places. Multi-agent interaction is an important branch of activity recognition. Although there has been relative success in gesture and individual action recognition, there is significant work to be done for multi-agent activities.

A novel approach for video event representation and recognition of multi agent interactions is presented in this work. The proposed approach uses Generalized Stochastic Petri Nets (GSPN) as behavior modeling formalism and introduces Petri net marking analysis for better scene understanding. The GSPN model provides remarkable flexibility in representation of time-dependent activities which usually coexist with logical, spatial and temporal relations in real life scenes.

Petri nets allow efficient modeling of the complex sequential and simultaneous activities but disregard any behavior history of the involved actors. The proposed modeling approach benefits from marking-dependent transition annotations and analysis. This powerful modeling extension allows marking dependent model parameters and provides probabilistic analysis of possible future states of the represented system.

The GSPN approach is evaluated using the developed surveillance system which provides user interface for creating behavior models and then can recognize events from videos and give a textual expression for the detected behavior. The proposed concepts were tested on both real and synthetic video scenes. The experimental results illustrate the ability of the system to create complex spatiotemporal and logical relations and to recognize the interactions of multiple objects in various video scenes using GSPN and marking analysis capabilities.