M.Sc Thesis

M.Sc StudentGlaser Tamar
SubjectIncorporating Temporal Context in Bag-of-Words Models
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Lihi Zelnik-Manor


Bag-of-Words (BoW) is a highly popular model for recognition of objects, scenes, actions and more, due to its robustness and simplicity. The importance of Bag of Words model derives from its numerous usages, in object recognition, scene recognition, action recognition and more. Its modeling capabilities, however, are somewhat limited since it discards the spatial and temporal order of the codewords. Many previous works had dealt with incorporation of spatial or temporal context into image and video representations in order to improve the image and video analysis capabilities. Some of these works had incorporated spatial context into Bag-of-Words model, for a specific feature vector representation and specific applications. In this research we propose a new general model: Contextual Sequence of Words (CSoW) which incorporates temporal order into the BoW model for video representation. The CSoW model can generally replace the BoW model, regardless of the feature vector representation methodology. The temporal context is incorporated in three scales that capture different aspects of the variability between different performances of the same action. The model framework refers to global temporal context - the absolute temporal location of the signal parts, and to relative temporal context - the relative temporal location of events and sub-events one versus another. We examine the CSoW on the task of action classification, for several different feature vectors representations, and show that using CSoW instead of BoW can lead to a significant improvement in action recognition rates, on several different setups.