|M.Sc Student||Divon Gilad|
|Subject||Viewpoint Estimation - insight and Model|
|Department||Department of Electrical Engineering||Supervisor||Professor Ayellet Tal|
|Full Thesis text|
This thesis addresses the problem of viewpoint estimation of an object in a given image, where the objects belong to several known categories. Convolutional Neural Networks were recently applied to this problem, leading to large improvements of state-of-the art results. Two major approaches have been pursued: a regression approach, which handles the continuous values of viewpoints naturally, and a classification approach, which discretized the space of viewpoints. We follow the second approach and present five key insights that should be taken into consideration when designing a convolutional Neural Network that solves the problem. These insights regard all three components of any network: the architecture, the training data, and the loss function. Based on these insights, the thesis proposes a network in which :(i) The architecture jointly solves detection, classification, and viewpoint estimation, using the most advanced CNN for performing the two former tasks.
(ii) New types of data are added and trained on, in order to address the shortage in labeled data. Specifically, we propose to utilize both
flipped images and video clips.
(iii) A novel loss function, which takes into account both the geometry of the problem, as well as the new types of data, is propose.
Our network improves the state-of-the-art results for this problem on PASCAL3D by 9.8%. The influence of each component is rigorously analyzed.