|M.Sc Student||Yehonatan Dishon|
|Subject||Motion Cues for Gaze Prediction in Video|
|Department||Department of Electrical Engineering||Supervisors||Full Professor Tal Ayellet|
|Professor Zelnik-Manor Lihi|
|Full Thesis text|
This thesis proposes a novel method for gaze prediction in video. Gaze prediction in video has seen increased research attention in recent years, both due to developments inspired by visual saliency progress in still images and adaptation of these ideas to videos, and due to elevating of the unique characteristics of the visual information and encoding of videos. Our approach is based on the fundamental assertion that low-level features should provide a strong cue for saliency. We show that it is important to integrate both spatial cues, similar to those used for image saliency estimation, with motion-based cues. We propose a method for computing motion features and show that using them improves prediction accuracy. In addition, we observe that temporal coherency can be utilized to stabilize and refine the predictions. We validate our method using two gaze-tracked video datasets and show we outperform the state-of-the-art. In accordance with our assertion that lowlevel cues should suffice for saliency prediction we show that using only low-level features leads to comparable results to sota algorithms that incorporate high-level semantics. The low-level features we use are orthogonal in their nature to highlevel cues, hence, augmenting our feature set with high-level cues further improves the prediction results.