Ph.D Thesis

Ph.D StudentNoonan John
SubjectIndoor Exploration with a Robotic Vehicle using a
Single Camera and a Floorplan
DepartmentDepartment of Computer Science
Supervisors PROF. Ehud Rivlin
DR. Hector Rotstein
Full Thesis textFull thesis text - English Version


Intelligent systems which monitor buildings for security, manufacturing, or warehouse pack-and-ship activities need to quickly build a full visual scene representation for personnel operating remotely to analyze and subsequently make decisions. Such intelligent systems are deployed frequently and regularly throughout the day and thus require accurate indoor localization systems and a fast generation of the full building visual scene representation.

We present a new minimalistic approach to indoor exploration: minimal sensing, minimal prior map knowledge, and minimal underlying geometry needed to facilitate building a full visual scene representation. We use a single camera and a floorplan to facilitate both localization and building a neural scene representation of the explored building, with a small robotic vehicle to carry out the exploration. A strong emphasis on modularity of components is also placed. The beauty of this minimalistic approach is that it serves the application-driven motivation for this research: widespread use, easy and fast deployment, and usage of ubiquitous information and equipment.

Our research combines both the classical and deep learning worlds, harnessing the strengths of each where they thrive. We introduce a novel neural scene representation that scales to full indoor buildings for view synthesis, describing it with a space of local neural rendering functions across the building which facilitates infusing prior meta-knowledge into the learning. Shared knowledge of performing neural rendering from various vantage points in the scene is realized by conditioning on similar building structure, resulting in accelerated learning for the full building. We demonstrate learning such a neural scene representation for view synthesis in around 15 minutes (versus days if prior work were directly used) on a single commodity GPU and rendering in real-time at 64 Hz, allowing for immersive visual experiences.

Indoor exploration also requires accurate global positioning. As such, we investigate indoor localization with our minimalistic approach in a comprehensive manner. We formulate a core methodology of integrating a floorplan with a monocular camera, forming the basis for multiple positioning systems which solve for global position, orientation, and scale. We explore the theoretical analysis of planar criteria related to visual features which allow unique global positioning solutions to be recovered. We develop multiple algorithms to handle various necessary components of indoor localization, such as extracting planes from scale-ambiguous monocular 3d pointclouds, associating extracted planes with floorplan walls, recovering the scale factor from wall-plane pairs, and integrating soft vehicle and floorplan constraints in an optimization to refine global poses.

We introduce multiple global positioning systems which are modular with respect to the underlying monocular SLAM algorithm, allowing our work to naturally grow with continued progress in the computer vision community. We demonstrate how we can achieve global localization with a very lightweight computational overhead of 0.1% compared to the underlying Bundle Adjustment optimization. We present both optimization-based and probabilistic approaches and evaluate on custom-created synthetic datasets, simulation datasets, and real-world datasets experimented using a custom designed-and-built small robotic vehicle.