M.Sc Thesis

M.Sc StudentDvorkind Gregory
SubjectSpeaker Localization in a Reverberant and Noisy
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Sharon Gannot


Determining the spatial position of a speaker finds a growing interest in video conference scenario where automated camera steering and tracking are required. Microphone array, which is usually used for speech enhancement in a noisy environment, can be used for the task of speaker localization as well. In this work we

present a dual step algorithm for source localization. During the first algorithmic step, time delay of arrival (TDOA) estimates of the speech signal are extracted from spatially separated microphone pairs. These estimates are then evaluated by

the second algorithmic step, to derive the source location. In this work we suggest that a sufficient quantity for TDOA estimation might be the speaker's acoustic transfer function (ATF) ratio, where the TDOA estimate is obtained from the location

of the maximal peak in the corresponding impulse response. We present novel, frequency domain, approaches for ATF-s ratio estimation in a reverberant and noisy environment. Our methods are based on the speech quasi-stationarity property, noise

stationarity assumption and on the fact that the speech and the noise are uncorrelated. The decorrelation criterion results in a nonlinear equation set with an inherent frequency permutation ambiguity problem. In this work, we suggest resolving the

ambiguity phenomenon by exploiting noise stationarity. Batch and recursive forms to the methods are derived. The importance of the recursive solutions manifests in tracking scenarios.

During the second algorithmic step, we start by evaluating the Cramér-Rao lower bound (CRLB) and show that in the usual TDOA-based localization scenario, where the microphone array has small inter-element spread relative to the source position, the usage of Cartesian coordinates is not practical. However, the azimuth and elevation angles (which contain the important information in applications of camera steering) might be estimated reliably. Several algorithms for angle estimation are presented.

We start by suggesting Gauss iterations, and proceed by observing that due to the speaker's smooth trajectory, estimates of close speaker positions might be used to improve the current position estimate. Two localization algorithms which exploit this temporal dependency are suggested. The first as a recursive form of the Gauss method and the second is the well known Extended Kalman Filter (EKF).

The mathematical derivations in this work are followed by an extensive experimental study which involves static and tracking scenarios.