Ph.D Thesis

Ph.D StudentAbramson Ari
SubjectMarkov-Switching GARCH Models and Applications to
Digital Speech Processing
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Israel Cohen
Full Thesis textFull thesis text - English Version


This dissertation addresses theory and applications of generalized autoregressive conditional heteroscedasticity (GARCH) models with Markov regimes for digital speech processing. The GARCH model is widely-used in the field of econometrics for volatility forecast derivation of econometric rates, and it was recently proposed in the field of signal processing for applications such as speech enhancement, speech recognition, and voice activity detection.

In this thesis, we develop a new statistical model for nonstationary signals in the joint time-frequency domain based on GARCH formulation with Markov regimes. The proposed model exploits the advantages of both the conditional heteroscedasticity structure of GARCH models and the time-varying characteristics of hidden Markov chains. The main motivation for this research is spectral modeling of speech signals for hands-free communication applications such as speech enhancement, nonsonstationary noise reduction, dereverberation, and audio source separation.

We analyze the asymptotic stationarity of Markov-switching GARCH (MS-GARCH) processes in the general case of (p,q)-order GARCH models with finite-state Markov chains. Necessary and sufficient conditions for asymptotic wide-sense stationarity are developed for several model formulations which are known in the literature. The properties of the proposed model are investigated and algorithms are developed for conditional variance, as well as for signal estimation in noisy environments. The proposed model with the corresponding estimation algorithms are shown to be useful for applications of speech enhancement and speech dereverberation. In addition, a state smoothing algorithm is developed for the sequence of active states estimation. Furthermore, a new formulation for the speech enhancement problem is proposed in this thesis, which incorporates simultaneous operations of detection and estimation. A detector for speech presence in the short-time Fourier transform domain is combined with an estimator, which jointly minimizes a cost function that takes into account both detection and estimation errors. We show that the proposed simultaneous detection and estimation approach enables greater noise reduction than estimation only approach, without further degrading the speech signal.

A simultaneous classification and estimation approach together with GARCH modeling is employed for developing an algorithm for single-sensor audio source separation. We show that for mixtures of speech and music signals, an improved source separation can be achieved compared to using Gaussian mixture model for both signals. Moreover, cost parameters enable one to control the trade-off between missed and false detection of the desired signal, and correspondingly the trade-off between signal distortion and residual interference.