Ph.D Thesis

Ph.D StudentCohen Meir
SubjectObserving the Observers: Social Context Analysis Using
Computer Vision
DepartmentDepartment of Computer Science
Supervisors PROF. Ilan Shimshoni
PROF. Ehud Rivlin
Full Thesis textFull thesis text - English Version


It is quite common that multiple human observers attend to a single point of interest. Mutual awareness activity refers to the dynamic of this social phenomenon. A peak of a mutual awareness activity is known as a mutual awareness event and can be interpreted as a "buzz" event, which draws the attention of many observers. A preferred way to monitor that social phenomenon is with a camera that captures the human observers while they observe the activity in the scene. Our work studies the underlying geometric constraints of mutual awareness events and the related dynamics of mutual awareness activities. Those constraints are reformulated in terms of image measurements, which are collected using existing face detection and head pose estimation algorithms. Those constraints are then used in a method that (1) detects how many such points of interest exist if any, (2) determines where each point is located, (3) identifies which observer attends to what, (4) reports where and when each observer was while attending, and (5) tracks the above quantities over a long time in an on-line manner.

The suggested method is unsupervised and can deal with the general case of an uncalibrated camera in a general environment and an unconstrained activity in the scene. This is in contrast to other work on similar problems that inherently assume a known environment or a calibrated camera or a restricted occurrence in the scene.

In addition, the current work attaches a social semantics to the detected mutual awareness activity. A deeper social interpretation is suggested by exploiting and analyzing the spatiotemporal correlations between the mutual awareness activity and the activity in the scene, i.e. the dynamics of the events which occur in the visual field of view of the observers. The statistics of those social interpretations are aggregated over a long time yielding social characteristics of an individual observer, the entire group, and of the activity in the scene.

The method was tested on about 75 images from various scenes. In addition, the method was tested on videos of interesting activities, including: a single moving human observer that fixates on a single static interest point, a panel of several participants, and a classroom with many observers. The method robustly detected mutual awareness events, tracked mutual awareness activities, estimated their related attributes, and linked them with various social interpretations.