|M.Sc Student||Baruhov Elena|
|Subject||Cyclic Generative Models for Depth Image Enhancement|
|Department||Department of Electrical Engineering||Supervisor||Professor Guy Gilboa|
|Full Thesis text|
Deep generative models have demonstrated enormous success in recent years, achieving breakthrough results in a wide array of computer vision applications. In particular, combined with adversarial losses, these modern deep architectures have been shown to produce detailed and visually convincing results, dramatically surpassing previous classical methods. The uniqueness of the adversarial loss is in replacing manually-designed penalties such as the MSE and MAE with learned penalties, which the network itself trains for the specific task at hand. Among the many successful applications of these Generative Adversarial Networks (or GANs) we find image deblurring, inpainting, super-resolution, colorization, style transfer, and more.
One of the most influential contributions to the GAN framework has been the Cycle-GAN, which learns a domain transfer function between two image domains, while requiring no ground truth input-output pairs. Instead, the Cycle-GAN architecture accepts only unpaired examples from each domain, and formulates the problem as an unsupervised task. The training is done by utilizing cycle-consistency as an information-retention criterion in combination with the adversarial losses, and leads to state-of-the-art results in translation tasks such as photo/painting, summer/winter, map/aerial photo, and many others.
This work focuses on the application of the Cycle-GAN framework to depth image enhancement. Depth images pose a particular challenge to deep networks due to the presence of unknown pixels and complex noise patterns. Specifically, we aim to enhance images from a real-world depth sensor, for which an analytic noise model is not available. In the absence of ground truth image pairs for this task, we instead formulate this as a domain-transfer problem between a low-quality sensor domain and a high-quality sensor domain. For the low-quality camera we select the Intel RealSense R200 stereo camera, and for the high-quality camera we select the Microsoft Kinect 2 time-of-flight camera. Using a dataset of unpaired, freely-captured depth images from each of the two cameras, we train a translation network able to bring the RealSense images to near Kinect quality.
We begin with the original Cycle-GAN network, and discuss its limitations for this challenging task. We describe the sources of these limitations - namely the complexity of the transform, the asymmetry of the task, and the lack of information equivalency between the two domains. We then propose several modifications to the framework. First, we replace the relatively small generative architectures with much larger ones, which have sufficient representational capacity for the translation task. We also employ depth-specific losses which take into account the missing pixels. Finally, we propose a novel tri-cycle loss as an alternative information-retention metric, addressing the asymmetry between the two domains.
We show that by combining these contributions, the resulting network is able to dramatically increase the quality of the RealSense images, far surpassing the results of the original Cycle-GAN formulation. We conclude that the proposed improvements effectively extend the applicability of the Cycle-GAN framework to more challenging and asymmetric tasks, and provide a new useful tool for handling domain-transfer problems.