M.Sc Thesis | |

M.Sc Student | Dvorkind Gregory |
---|---|

Subject | Speaker Localization in a Reverberant and Noisy Environment |

Department | Department of Electrical and Computer Engineering |

Supervisor | PROF. Sharon Gannot |

Determining the spatial position of a speaker finds a growing interest in video conference scenario where automated camera steering and tracking are required. Microphone array, which is usually used for speech enhancement in a noisy environment, can be used for the task of speaker localization as well. In this work we

present a dual step
algorithm for source localization. During the first algorithmic step, *time
delay of arrival* (TDOA) estimates of the speech signal are extracted from
spatially separated microphone pairs. These estimates are then evaluated by

the second algorithmic
step, to derive the source location. In this work we suggest that a sufficient
quantity for TDOA estimation might be the speaker's *acoustic transfer
function* (ATF) ratio, where the TDOA estimate is obtained from the location

of the maximal peak in the corresponding impulse response. We present novel, frequency domain, approaches for ATF-s ratio estimation in a reverberant and noisy environment. Our methods are based on the speech quasi-stationarity property, noise

stationarity assumption and on the fact that the speech and the noise are uncorrelated. The decorrelation criterion results in a nonlinear equation set with an inherent frequency permutation ambiguity problem. In this work, we suggest resolving the

ambiguity phenomenon by exploiting noise stationarity. Batch and recursive forms to the methods are derived. The importance of the recursive solutions manifests in tracking scenarios.

During the second
algorithmic step, we start by evaluating the *Cramér-Rao lower bound*
(CRLB) and show that in the usual TDOA-based localization scenario, where the
microphone array has small inter-element spread relative to the source
position, the usage of Cartesian coordinates is not practical. However, the azimuth
and elevation angles (which contain the important information in applications
of camera steering) might be estimated reliably. Several algorithms for angle
estimation are presented.

We start by suggesting
Gauss iterations, and proceed by observing that due to the speaker's smooth
trajectory, estimates of close speaker positions might be used to improve the
current position estimate. Two localization algorithms which exploit this
temporal dependency are suggested. The first as a recursive form of the Gauss
method and the second is the well known *Extended Kalman Filter* (EKF).

The mathematical derivations in this work are followed by an extensive experimental study which involves static and tracking scenarios.