M.Sc Thesis

M.Sc StudentAharon Michal
SubjectRepresentation Analysis and Synthesis of Lip Images Using
Dimensionality Reduction
DepartmentDepartment of Computer Science
Supervisors PROF. Michael Elad
PROF. Ron Kimmel


Understanding facial expressions in image sequences is an easy task for humans. Some of us are capable of lipreading, by interpreting the different motions of mouths. Automatic lipreading by a computer is a challenging task, with so far limited success. The inverse problem of synthesizing real looking lip movements is also highly non-trivial. Today, the technology to automatically generate an image series that imitates natural postures is far from perfect.
We introduce a new framework for facial image representation, analysis, and synthesis (here we refer just to the lower half of the face with a focus on the mouth). It includes interpretation and classification of facial expressions and visual speech recognition, as well as a synthesis procedure of facial expressions that yields natural looking facial movements.
Our facial image analysis and synthesis processes are based on a parameterization of the mouth configuration set of images. These images are represented as points on a two-dimensional flat manifold, such that the Euclidean distance between each two points on the plane is set to be as close as possible to the dissimilarity between the two corresponding images. This representation is achieved using a weighted dimensionality reduction method, and enables us to efficiently define the pronunciation of each word as a contour and thereby analyze or synthesize the motion of the lips.
We present some examples of automatic lips motion synthesis and lipreading, and propose a generalization of our solution to the problem of lipreading different subjects.