טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentMishne Gal
SubjectDiffusion Nets and Manifold Learning for High-
Dimensional Data Analysis in the Presence of
Outliers
DepartmentDepartment of Electrical Engineering
Supervisor Professor Israel Cohen
Full Thesis textFull thesis text - English Version


Abstract

In this thesis, we present a data-driven approach for analysis of high-dimensional data in the presence of outliers. Specifically, we focus on geometry-based manifold learning techniques, which provide a new low-dimensional embedding of the data, in supervised and unsupervised settings.


First, we present Diffusion Nets, a new deep learning neural-network architecture. We employ deep learning from a manifold learning perspective, by explicitly incorporating a manifold embedding of the data in the deep learning framework, constructing a geometric autoencoder. We propose to add the diffusion embedding of the training data in training the network and introduce new neural net constraints that preserve the local geometry of the data. The Diffusion Nets architecture consists of two components, an encoder and a decoder, that learn the direct and inverse mapping between a high-dimensional dataset and its low-dimensional manifold embedding. Stacked together, the two form a deep autoencoder, which maps the data to itself, as seen through the embedding. The three new networks enable us to solve three closely related problems in manifold learning: (a) out-of-sample function extension of the manifold embedding to new test points, (b) mapping points from the embedding to the data space (pre-image solution), and (c) anomaly detection on test data.


In the second half of my thesis, we focus on geometry-based manifold learning techniques for unsupervised anomaly detection and supervised target detection in images. When applying manifold learning techniques to large datasets or when extending from a training set to a test set, out-of-sample function extensions methods are used to calculate the embedding for new points. We analyze the limitations in applying out-of-sample function extension for manifold learning in these scenarios and propose a robust solution. In addressing anomaly detection, we propose a multiscale approach that overcomes these limitations, learning a new representation for both the background and the outliers. We present and compare two new detection scores based on the noise-robust diffusion distance.


Finally we propose a new supervised metric for target detection, based on calculating a local model for each training point using its local neighborhood within the training set. We show that by controlling the measure of locality defining these neighborhoods, one can construct a metric that is invariant to perturbations in the appearance of the target. We show that this invariance has an intuitive meaning in the patch space, and analyze how the local neighborhood construction yields an invariance to implicit factors in the intrinsic parameter space. Incorporating this metric in a supervised graph framework, we construct a low-dimensional global embedding of the test set and present a new efficient detection score. We demonstrate that our solutions are robust and independent of the imaging sensor, achieving impressive results on challenging remote sensing datasets.