M.Sc Thesis | |
M.Sc Student | Aharon Michal |
---|---|
Subject | Representation Analysis and Synthesis of Lip Images Using Dimensionality Reduction |
Department | Department of Computer Science | Supervisors | PROF. Michael Elad |
PROF. Ron Kimmel |
Understanding facial
expressions in image sequences is an easy task for humans. Some of us are
capable of lipreading, by interpreting the different motions of mouths. Automatic lipreading by a computer is a challenging task, with so far limited success. The inverse problem of synthesizing real looking lip movements is also highly non-trivial. Today, the technology to automatically generate an image series that imitates natural postures
is far from perfect.
We introduce a new
framework for facial image representation, analysis, and synthesis (here we
refer just to the lower half of the face with a focus on the mouth). It includes interpretation and classification of facial expressions and visual speech
recognition, as well as a synthesis procedure of facial expressions that yields
natural looking facial movements.
Our facial image analysis and
synthesis processes are based on a parameterization of the mouth configuration
set of images. These images are represented as points on a two-dimensional flat
manifold, such that the Euclidean distance between each two points on the plane
is set to be as close as possible to the dissimilarity between the two corresponding
images. This representation is achieved using a weighted dimensionality
reduction method, and enables us to efficiently define the pronunciation of
each word as a contour and thereby analyze or synthesize the motion of the lips.
We present some examples of
automatic lips motion synthesis and lipreading, and propose a generalization of
our solution to the problem of lipreading different subjects.