טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentBahat Yuval
SubjectMultimodal Audio Inpainting
DepartmentDepartment of Electrical Engineering
Supervisor Professor Yoav Schechner
Full Thesis textFull thesis text - English Version


Abstract

The popularity of voice over internet protocol (VoIP) systems is continuously growing.

In many cases, VoIP systems are included in video conference applications, which convey video, in addition to audio. VoIP rely on unreliable internet communication, in which chunks of data often get lost during transmission. Various solutions to this problem were proposed, most of which are better suited to small rates of lost data, which induces relatively short audio gaps. This work addresses this problem by taking an example-based approach. We aim at achieving a perceptually plausible result. Thus, data gaps are filled using audio examples taken from prior recorded speech of the same speaker. The example used for filling each gap is chosen, amongst all those collected, by finding the one which best matches the gap. Matching is done using several types of features, including both auditory and visual ones (In the case of a video conference application). We also examine the use of statistical prior knowledge to improve the example matching performance. Finally, several audio synthesis techniques are applied to produce a smooth reconstructed audio signal. The effectiveness of the proposed solution is demonstrated experimentally, even in case of large data-gaps, which cannot be handled by the standard PLC techniques.