|M.Sc Student||Litvin Yevgeni|
|Subject||Single-Channel Blind Source Separation of Audio Signals|
|Department||Department of Electrical Engineering||Supervisors||Professor Israel Cohen|
|Dr. Dan Chazan|
|Full Thesis text|
In this thesis we address the problem of audio source separation from a single audio source. BSS from a single audio channel is a special case of general BSS problem where data from only one source is available to the algorithm. The problem becomes easier if separated audio signals belong to different signal classes that can be classified based upon prior knowledge using existing statistical learning techniques.
We define and study three different algorithms. We note that for some sets of signal classes, the frequency modulating (FM) component of sub-bands carries discriminative information. For example, this is true in an important case of speech and music signals. We use time localized energy of the FM component for the classification of time-frequency bins and create a binary mask that is used for rejecting the undesired signal. The difference in the sub-band FM signal energy of speech and musical signals, together with sparseness and independence of mixture components make the separation possible. We show that the proposed algorithm exhibits superior performance when compared to a competitive source separation algorithm.
In the second algorithm we use Bark Scaled (BS) Wavelet Packet Decomposition (WPD) analysis. The BS-WPD analysis was previously used in the speech enhancement task. We introduce a modification of the BS-WPD analysis and combine it with an existing BSS algorithm based on the Gaussian Mixture Modeling (GMM). In the first stage of the algorithm, the signal is analyzed using modified BS-WPD analysis and a Gaussian mixture model is trained. In the second stage a mixed signal is separated using the statistical model. The baseline separation algorithm relies on the differences in statistical model parameters. The proposed psycho-acoustically motivated non-uniform filter-bank structure reduces feature vectors dimension. It simplifies training procedure of the statistical model and in some scenarios results in better performance.
Finally, we define short time spectral kurtosis (STSK) as a time localized estimate of spectral kurtosis. We use a value of STSK as a local time-frequency feature for the classification of time-frequency bins. We create a binary mask based on values of STSK. This algorithm relies on differentiating properties of STSK, sparseness and independence of mixed signals. The mask is capable of rejecting the undesired signal. We present good audio source separation experimental results.