M.Sc Thesis

M.Sc StudentAloni Abuhasira Lior
SubjectJoint Geometric and Topological Analysis of
Hierarchical Datasets
DepartmentDepartment of Electrical and Computer Engineering
Supervisors ASSOCIATE PROF. Omer Bobrowski
Full Thesis textFull thesis text - English Version


In a world abundant with diverse data arising from complex acquisition techniques, there is a growing need for new data analysis tools and frameworks.

Our focus is on high-dimensional data that are organized into several hierarchical datasets, where each dataset consists of complex samples, and every sample has a distinct irregular structure modeled by a graph.

Such datasets arise in many applications from a broad range of fields such as cytometry and gene expression in bioinformatics, social and computer network analysis, medical imaging, and geophysical tomography, to name but a few.

In this work we combine two complementing data-analytic approaches and propose a hierarchical method for constructing informative representations of such datasets along with a suitable metric. This method consists of two stages.

At the coarse stage, we employ Topological Data Analysis (TDA) to extract qualitative structural information from the datasets.

More concretely, we use persistent homology as a topological signature for each dataset. Briefly, persistent homology is a topological-algebraic tool that captures information about connectivity and holes at various scales. It is computed over the ensemble of samples contained in each dataset.

The signature provided by persistent homology is also supplied with a natural metric that allows us to compare the structure of different datasets.

At the fine stage, we use manifold learning techniques to facilitate quantitative structural analysis for each individual sample (modeled as a graph). This enables us to devise a distance between the samples within each dataset, which is the input of our persistent homology computation.

To demonstrate the advantages of our method, we apply it to Hyperspectral Imaging (HSI).

HSI is a sensing technique aimed to obtain the electromagnetic spectrum at each pixel within an image, with the purpose of finding objects, identifying materials, or detecting processes. We consider a single HSI image as a dataset, and the spectrum of each pixel as a single sample.

We tested this pipeline on categorical HSI images of the Earth's surface from NASA Jet Propulsion Laboratory's Airborne Visible InfraRed Imaging Spectrometer (AVIRIS), and showed that our unsupervised method separates well between different categories.

In addition, we showed that in a (supervised) classification task based on the attained (unsupervised) representation and metric, our method outperforms state-of-the-art methods including deep-learning based approaches.