M.Sc Thesis

M.Sc StudentAtias Avner
SubjectVideo Target Tracking using Sweeping Experts
DepartmentDepartment of Electrical and Computer Engineering
Supervisor ASSOCIATE PROF. Moshe Porat


Target tracking is a difficult task of computer vision. Its difficulty stems from doing numerically what is done cognitively by the brain, sometimes with little or no training at all: only one viewing is usually required to start a tracking task without the need to train on multiple examples of the tracked object. It allows tracking even if part of the target is occluded, and if fully occluded, it can continue the tracking as soon as some target mass re-appears in the scene. Common noisy environment, sun reflection, color changes due to varying illumination and turbulence do not usually inhibit the human ability to perform a tracking task.

A new machine vision framework for the tracking task is proposed in this thesis. The suggested framework strives to find features of the target and a way to identify them throughout time despite all those difficulties. For that, a new approach is required: acquisition instead of tracking. If a device can pinpoint the position of a selected object through time, then tracking becomes a byproduct of that acquisition. This warrants an outlook of the whole scene in every acquisition/tracking step. Accordingly, this work suggests an algorithm to capture the visual information from the whole scene and classify the target within it.

The method suggested in this work is based on generating a feature vector for patches of the image. This feature vector smooths out noises and especially small variants in pixel values caused by the target motion. It enables an efficient comparison due to its short length (<20) and enables the analysis of the entire scene at every frame and not just within a limited Area of Interest

The scene is divided into equally sized patches and each patch generates a feature vector by passing through a Self-Expanding Network (SEN). The SEN is a cubical network of nodes that each contains a signal. The signal of the patch is correlated with those signals contained in the nodes and paves its path through the network according to prescribed rules. The path is recorded and represents the patch as its SEN feature vector. The physical path is compared with future paths. It is shown that this approach generates repeatable paths for similar signals, and different paths for signals that share lower similarity.

The above description handles the classification of scene patches relative to previous frames. The proposed algorithm handles past target information using a management system and decision based on past feature vectors. On one hand it does not rely heavily on the past, and on the other hand it does not lose past information entirely. Independency between frames is also one of the method’s merits. This independency enables the method to identify the target in the frame even in challenging cases of a jumping target due to a camera mounted on a non-stable platform. Our conclusion is that this is a step forward toward machine target tracking that could cope with a larger variety of scenes compared to traditional tracking approaches.