Ph.D Thesis

Ph.D StudentKviatkovsky Igor
SubjectPerson Identification From Contextual Motion
DepartmentDepartment of Computer Science
Supervisors PROF. Ehud Rivlin
PROF. Ilan Shimshoni
Full Thesis textFull thesis text - English Version


Body language has been shown to be one of the most powerful features used by humans to infer identity and attributes such as age and gender.

Human studies of body language have been the motivating force behind computer vision systems for automatic person identification from motion.

Most efforts focused on analyzing the discriminative properties of locomotion, e.g., gait, extracted from video sequences or motion capture (mocap) data. The growing popularity of depth camera sensors has enabled additional modalities such as depth and skeleton to be used for this purpose.

The primary focus of this thesis is the problem of identifying people using their motion patterns collected in different scenarios. In each scenario the mechanism for setting the context for these motion patterns is different. We define three representative use cases, which we believe cover the vast majority of the scenarios for person identification from motion. The interactive person identification use case, introduced here for the first time, addresses a novel natural user interface scenario gaining popularity over the past few years. We present a generative probabilistic framework for inferring the person identity from contextual motion and present evaluation on several publicly available datasets. The interactive scenario is evaluated on our own, privately collected dataset, recorded using an interactive setup built especially for this purpose.

Another related problem we address here is one of fine grain interaction analysis. More specifically, the goal is to predict the trajectory of an object the person is in interaction with, based on that person's motion pattern alone. Possible application of such intent regression task can be found in assistive robotics and automatic analysis of sport events. We formalize the problem as a sequence labeling one and present two possible solutions based on existing dynamic probabilistic graphical models.

Additional contribution of this thesis includes a real-time online action recognition algorithm from RGB data. The problem of low latency action recognition relying on an incomplete spatio-temporal volume, given a very limited compute budget, arises in the domain of natural user interface. We propose an exceptionally low dimensional descriptor, efficiently computed from the raw RGB video stream. Additional optimizations significantly improve the algorithm's computational efficiency while keeping the recognition accuracy intact. 

Finally, we present a theoretical proof of the equivalence of two widely known discriminative sparse dictionary learning algorithms, the LC-KSVD and the D-KSVD.

Although, the result is general and is not directly focused on the application domain addressed in this thesis, it does significantly contribute to action recognition using sparse representations.