Ph.D Thesis

Ph.D StudentMousazadeh Saman
SubjectSpeech and Audio Processing Based on Harmonic Analysis
and Statistical Models
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Israel Cohen
Full Thesis textFull thesis text - English Version


This dissertation addresses the theory and applications of geometric and model-based methods in signal processing.

In the first part, we concentrate on geometric methods in signal processing. First, we address the voice activity detection problem. Speech/non-speech classification is an unsolved problem in speech processing and affects diverse applications including robust speech recognition, discontinuous transmission, real-time speech transmission on the Internet or combined noise reduction and echo cancellation schemes in the context of telephony. We address the problem of voice activity detection in presence of transient noise using geometric methods. More specifically, We use spectral clustering to perform this task.

Embedding and function extension on directed graphs is the next topic presented in this thesis. We provide a graph embedding algorithm which is motivated by Laplacian type operator on manifold. We also introduce a Nystrom type eigenfunctions extension which is used both for extending the embedding to new data points and to extend an empirical function on new data set.

We also concentrate on the problem of extending a band-limited function defined on homogeneous manifold. The proposed method has a closed form solution and consists of matrix multiplication and inversion. As the size of data approaches infinity, the proposed method converges to the optimal solution as long as the function values are known on an appropriate sampling set.

In the second part of this thesis, we concentrate on the model-base signal processing. We choose the ARCH model and its generalization, the GARCH model. ARCH model is a statistical model which explicitly parametrizes a time-varying conditional variance using squared absolute values, while considering volatility clustering and excess kurtosis (i.e. heavy-tailed distribution). We use this model along with the conventional autoregressive (AR) model and present a voice activity detection procedure based on the AR-GARCH modeling of speech signal in time domain.

The problem of anomaly detection in SONAR images is also addressed in this thesis. We introduce Non-causal AR-ARCH model for modeling the background of a SONAR image in wavelet domain. An efficient method for estimating the model parameters together with analytical analysis of this estimator are also provided.

The last topic addressed in our research is  parameter estimation of the ARCH and GARCH models in presence of additive noise. We provide two different methods based on maximum likelihood estimation. In the first method, we find the exact likelihood function of the noisy observations and find the maximum of this function using gradient-based methods. Here, we assume that both the process noise and the corrupting noise are distributed normally. Since these conditions might be violated in practical situations, we propose a method based on particle filters for maximum likelihood estimation of the parameters of the ARCH and the GARCH models. This method is based on gradient descend method and active set method for maximizing the likelihood function over parameters under stationarity constraints. The gradient of the likelihood function of observation given the parameters of the model, which is needed for gradient based optimization algorithm, is estimated using particle methods.