M.Sc Thesis

M.Sc StudentRapaport Guy
SubjectCodebook-Based Single-Channel Blind Source Separation of
Audio Signals
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROF. Israel Cohen
Full Thesis textFull thesis text - English Version


In this thesis we address the challenge of single channel blind source separation (BSS). The single channel BSS is an extreme situation of an under-determined BSS problem in which only a single linear mixture of two instantaneous sources is given. Due to the under-determined nature of the BSS problem, a-prior information about the sources must be incorporated in order to successfully separate them from their mixture.

A variety of priors have been suggested within the framework of single-channel BSS, among them are conceptual cues, statistical source modeling and codebook (CB) based source representation. Apart from special cases, it seems that current solutions for single channel BSS still have not matured enough for real-life applications.

Throughout this research, we focus on three types of CB-based separation algorithms. The first evolves from the Gaussian mixture model (GMM), the second is derived by using a dictionary of Auto regressive (AR) processes and the third is based on the Non-negative Matrix Factorization (NMF) scheme. These separation algorithms utilize a CB for each source and apply it as a prior in the mixture separation scheme.

We define two innovative CB-based separation algorithms. First, we introduce a generalization for the GMM/AR-based separation scheme. The GMM/AR-based separation cost function treats each frequency bin (in the STFT domain) identically. Instead, our scheme introduces a frequency-dependent cost function. This enables a differentiation between frequency bins according to their observed energy or according to the characteristics of the source. Second, an additional prior is introduced into the GMM/AR-based separation cost function. The original cost function only requires that the combined Power Spectral Density (PSD) of the estimated sources will be similar to the observed PSD, under the assumption that the sources are statistically independent. Our addition also considers how `distant' the two estimated sources' PSDs are.

Finally, we test the separation performances of the GMM/AR/NMF-based algorithms and the two proposed separation algorithms in two real audio separation scenarios. We conclude that the GMM-based source separation algorithm produced superior performance in comparison with the AR/NMF-based separation algorithm. Specifically, the best separation performance was obtained by using a generalization of the GMM model, the Gaussian Scaled Mixture Model (GSMM). We further show that the frequency-dependent separation algorithm produces superior results in comparison with the GSMM-based separation algorithm. However, the addition of the `distant' PSDs prior does not improve the separation results in comparison with the GSMM-based separation algorithm.