M.Sc Thesis

M.Sc StudentSagi Ariel
SubjectData Embedding in Speech Signals
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROFESSOR EMERITUS David Malah


In this work, a data embedding technique for speech signals, exploiting the masking property of the human auditory system, is developed. The signal in the frequency domain is partitioned into subbands. The data embedding parameters of each subband are computed from the auditory masking threshold function and a channel noise estimate. Data embedding is performed by modifying the discrete Hartley transform coefficients according to the principles of the scalar Costa scheme (SCS).

There are many types of deliberate and of unintentional attacks on data-embedding systems. A specific unintentional attack, which is caused by transmitting a speech signal with embedded-data over a telephone channel, is considered in our work.

Adaptive equalization is applied for improving the robustness against telephone channel degradations, by aiming to reduce the spectral channel-distortion. Classical methods for channel equalization, such as the NLMS and the RLS, are examined. A novel subband structure RLS algorithm, is developed, and subsequently compared to the classical algorithms.

Blind detection of embedding-parameters is needed if the decoder does not have the embedding parameters. The embedding-parameters include the embedded-data presence in subbands, and the subband quantization-step. In the discussed scheme, a maximum likelihood detector of the embedding parameters is developed. The demonstrated system achieves transparent (MOS=3.9) data-embedding at the rate of 600 information bits/second with a low bit-error-rate of approximately 7e-5.

The data-embedding technique is demonstrated by embedding data in a narrowband speech signal transmitted over a telephone channel. The embedded-data can be exploited for wideband speech reconstruction. By means of linear prediction techniques, the wideband speech is reconstructed from a synthetic wideband excitation, generated from the narrowband speech, and a wideband spectral envelope, which is parametrically represented and transmitted as embedded-data in the narrowband speech. The averaged LSD obtained, measured over the 3-8KHz range, was 3.1dB.