טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentLeichter Ido
SubjectVisual Tracking in a General Context via Tracker
Combination and Low-Level Cues
DepartmentDepartment of Computer Science
Supervisors Professor Ehud Rivlin
Professor Michael Lindenbaum
Full Thesis textFull thesis text - English Version


Abstract

This thesis addresses the problem of visual tracking in video in a general context using two approaches. The first approach consists of the combination of multiple trackers that use different features and thus have different failure modes. A general framework for combining visual trackers that propagate filtering distributions over time is proposed. The individual trackers may propagate the filtering distributions either explicitly, for example, via Kalman filtering, or by using sample-sets of the distributions, via particle filtering. The proposed framework enables the combination of trackers of different state spaces, and in many cases it allows treating the individual trackers nearly as "black boxes." Another benefit of the proposed framework is that it may be applied as is to the combination of trackers that track different, albeit related, targets. The suggested framework was successfully tested using various state spaces and datasets.


The second approach consists of the employment of basic, low-level visual characteristics, which are typically valid. A tracker of object bitmaps (silhouettes) is proposed without using any prior information about the target or the scene. The low-level visual characteristics employed are (short-term) Constancy of Color - the color projected to the camera from a point on a surface is approximately similar in consecutive video frames; Spatial Motion Continuity - the optical flow of the vast majority of the pixels in an image region corresponding to an object is spatially continuous; Spatial Color Coherence - it is highly probable that adjacent pixels of similar color belong to the same object. The tracker relies only on these basic visual characteristics, which makes it applicable in a very general context. The proposed tracker works by approximating, in each video frame, a probability distribution function of the target's bitmap and then estimating the maximum a posteriori bitmap.


The usage of kernel-based trackers may also be included in the second approach. Two kernel-based trackers are proposed. The first tracker exploits the constancy of color and the presence of color edges along the target boundary. This tracker uses these two visual cues to affinely transform the kernel over time. The second kernel-based tracker enhances the Mean Shift tracker to use multiple color histograms obtained from different target views. This enhancement makes the Mean Shift tracker suitable for tracking objects whose colors that are revealed to the camera change over time. Both these kernel-based trackers were experimentally validated to cope with tracking scenarios for which traditional kernel-based trackers fail.