M.Sc Thesis


M.Sc StudentAzaria Reut
SubjectFast Monocular Depth Estimation for Autonomous Underwater
Vehicles
DepartmentDepartment of Electrical and Computer Engineering
Supervisor ASSOCIATE PROF. Guy Gilboa


Abstract

Estimating an accurate depth map of a scene is essential for navigation and collision

avoidance of autonomous vehicles. Although several methods exist and perform well

above water, for the underwater environment this task is challenging not only due to

the optical effect of the water but also due to the lack of datasets for deep learning

methods.

In this research, we consider the problem of underwater dense depth completion,

when sparse measurements are given, and with the aid of additional input images, the rest should be interpolated. In principle, two cameras can be mounted on a robot to

compute stereo disparities. Yet, this is highly impractical for small, agile systems, as

the required baseline significantly increases drag and limits the platform's capabilities.

Thus, we focus on monocular depth estimation with a high level of accuracy to enable

safe navigation for an autonomous underwater vehicle (AUV) in real-time.

We suggest a training framework that gets input images from a monocular camera,

jointly with sparse measurements generated from real-time SLAM. Our model uses an

adjusted loss function that ensures a minimal error in the short navigation range.

Alongside a depth map, our model outputs the uncertainty of the predictions,

indicating the accuracy of each prediction.

We conduct experiments on a new dataset collected by the Marine Imaging Lab

(Haifa University), and show that our architecture provides greater accuracy as the

distance decreases, and hence can be applied for real-time navigation. By defining a

new error measurement that estimates the percentage of inaccurate predictions that are

deeper than the ground truth, we examine our model performance. We show that our

model precision provides safe navigation for an AUV, with an error of 5% in a short

range of up to 3[m]. Our proposed network runs on average at 3[fps] on an NVIDIA

Jetson Xavier GPU.