M.Sc Thesis

M.Sc StudentFinkelstein Yehuda
SubjectUnsupervised Phoneme Alignment under Transient
DepartmentDepartment of Electrical and Computer Engineering
Supervisor ASSOCIATE PROF. Yacov Crammer


One requirement for researching and building spoken language systems is the availability of speech data that have been labeled and time-aligned at the phonetic level. Time-aligned phonetic labels can be created either by a trained human labeler or by an automatic method. Although manual phonetic alignment is considered more accurate than automatic methods, it is too time consuming to be commonly used for aligning large corpora. The automatic methods for creating time-aligned phonetic labels, can be classified into two main approaches: supervised and unsupervised approaches. Most unsupervised approaches phoneme alignment, first generate descriptive spectral features over small time-windows, and then set boundaries between sub-sequences, such that the features within the sub-sequences are similar to each other. This approach fails to work under strong transient noise, such as the sound of a hammer, mouth clicks, teeth chatters, which causes many algorithms to set the boundaries close to the noise bursts. We focus on the unsupervised approach under transient noise. We cast the alignment problem with noise using three objective functions. We describe an efficient implementation of algorithms to optimize the objectives, and perform an empirical study showing that our approach outperforming other methods, and is also robust to the choice of parameters.