Ph.D Thesis

Ph.D StudentAvraham Tamar
SubjectVisual Attention Processes based on Stochastic Models:
Algrorithms and Bounds
DepartmentDepartment of Computer Science
Supervisor PROF. Michael Lindenbaum
Full Thesis textFull thesis text - English Version


Image analysis processes often scan the image exhaustively, looking for familiar objects of different location and size. An attention mechanism that suggests priorities to different image parts, and by that directs the analysis process to examine more interesting locations first, can make the whole process much more efficient. Motivated from studies of the human visual system, this study focuses mainly on inner-scene similarity as a source of information, testing its usefulness for directing computer visual attention from different perspectives. A study of the inherent limitations of visual search algorithms suggested the COVER measure and shows that it can quantify the difficulty of a search task. Taking a stochastic approach, we model the identity of the candidate image parts as a set of correlated random variables and derive two attention/search algorithms. The first algorithm, denoted VSLE (Visual Search using Linear Estimation), suggests a dynamic search procedure. Subsequent fixations are selected from combining inner-scene similarity information with the recognizer's feedback on previously attended sub-images. The second algorithm, denoted Esaliency (Extended Saliency) needs no recognition feedback and does not change the proposed priorities. As a static algorithm it can compete with previous attention mechanisms that also suggest a pre-calculated saliency map that is used for guiding the fixation order. Esaliency incorporates inner-scene similarity information with the common expectation for a relatively small number of objects of interest in a scene and with the observation that the content of an image has a relatively clustered structure. Unlike other acceptable models of visual attention that associate saliency with local uniqueness, the Esaliency algorithm takes a global approach by considering a region as salient if there are only a few (or none) other similar regions in the whole scene. The algorithm uses a graphical model approximation that allows the hypotheses for target locations with the highest likelihood to be efficiently revealed. While our main goal is attention mechanisms for computer vision, we have also tested the relevance of our models for human performance prediction in a Cognitive Psychology study. We extended our models to account for internal-noise, and tested their predictive abilities for orientation-search and color-search tasks where distractors' homogeneity and target-distractors similarity were systematically manipulated. In comparison to other prominent models of human visual search, the predictions of our models were the closest to the actual human performance.