טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentNurit Spingarn
SubjectSpeaker Diarization Using GMM Mean Supervector and
Advanced Dimensionality Reduction Algorithms
DepartmentDepartment of Electrical Engineering
Supervisor Full Professor Cohen Israel
Full Thesis textFull thesis text - English Version


Abstract

Speaker diarization is defined as the task of tagging different speakers within an unmarked speech sequence . Speaker diarizarion has attracted significant research effort in the last decade. Nevertheless, diarization of short utterances (2-5 seconds), particularly under noisy conditions, is still a very challenging task . In this dissertation, we introduce a speaker diarization system which is based on three components: A voice activity detection (VAD), utterances representation and spectral clustering .

Traditional VADs suffer from high false detection rates in noisy conditions . Therefore, we have implemented a unique VAD which aims to deal with noisy environments. It is based on extraction of special features, graph embedding and Laplacian pyramid representation .

State-of-the-art speaker diarization systems are usually based on statistical models. Their segmentation stage is obtained by a speaker change point detector that employs the Bayesian information criterion (BIC). It uses a penalized likelihood ratio test (LRT) to detect a speaker change point within a sliding window . Then, tagging different speakers is obtained by an agglomerative hierarchical clustering which also assumes a statistical model for each segment . In case of short utterances, the statistical models are not reliable and therefore state-of-the-art speaker diarization systems are generally characterized by low performance in terms of speaker change point detection and confusion between speakers .

The proposed speaker diarization algorithm utilizes spectral clustering for final tagging . Each segment of speech is represented by a GMM mean supervector as well as by its first and second derivatives, which introduce additional aspect of discrimination between speakers. These supervectors are used as inputs to the spectral clustering technique . Spectral clustering exploits the eigenvectors of the similarity matrix of the data in order to perform dimensionality reduction. This technique enables to capture the most informative eigenvectors which improve the clustering performance and the robustness of the system to noise .

Experimental results demonstrate a significant reduction of error rate in comparison with state-of-the-art and unsupervised diarization methods .