Ph.D Thesis

Ph.D StudentMargolin Ran
SubjectOn Saliency Estimation and Scene Classification
DepartmentDepartment of Electrical and Computer Engineering
Supervisors PROF. Ayellet Tal
PROF. Lihi Zelnik-Manor


Every picture tells a story. In photography, the story is portrayed by a composition of objects. Were we to remove these objects, the story would be lost. When manipulating images, it is crucial that the story of the piece remains intact. As a result, the knowledge of the location of these objects is essential. We begin by proposing an approach for saliency detection that combines previously suggested patch distinctness with an object probability map. The object probability map infers the most probable locations of the prominent objects according to highly distinct salient cues.

Next, we investigate what makes an object salient. Most previous work assert that distinctness is the dominating factor. The difference between the various algorithms is in the way they compute distinctness. Some focus on the patterns, others on the colors, and several add high-level cues and priors.

We propose a simple, yet powerful, algorithm that integrates these three factors. Our key contribution is a novel and fast approach to compute pattern distinctness. We rely on the inner statistics of the patches in the image to identify unique patterns. We show that our approach outperforms all state-of-the-art methods on the five most commonly-used datasets.

Then we broaden the scope of saliency maps and look into the measures used to evaluate foreground maps. The output of many algorithms in computer-vision is either non-binary maps or binary maps. Several measures have been suggested to evaluate the accuracy of these maps.

In our work, we show that the most commonly-used measures do not always provide a reliable evaluation. We start by identifying three causes of inaccurate evaluation. We then propose a new measure that amends these flaws. An appealing property of our measure is being an intuitive generalization of the F-measure.

Finally we propose four meta-measures to compare the adequacy of evaluation measures. We show via experiments that our novel measure is preferable.

Lastly we tackle the task of scene classification. Scene classification is the task of determining the scene type in which a photograph was taken. In our work we present a novel local descriptor suited for such a task:  Oriented Texture Curves (OTC).  Our descriptor captures the texture of a patch along multiple orientations, while maintaining robustness to illumination changes, geometric distortions and local contrast differences. We show that our descriptor outperforms all state-of-the-art descriptors for scene classification algorithms on the most extensive scene classification benchmark to-date.