M.Sc Thesis


M.Sc StudentKravi Ayelet
SubjectCorrelation Clustering With Overlaps
DepartmentDepartment of Computer Science
Supervisors ASSOCIATE PROF. Roy Schwartz
PROF. Joseph Naor
Full Thesis textFull thesis text - English Version


Abstract

The Overlapping Correlation Clustering problem has multiple real-world applications in diverse areas such as: social networks (community detection), biology (protein analysis) and information retrieval (categorizing documents).

Correlation Clustering is a problem of assigning data points into distinct groups (clusters) based on similarity.

When assigning two similar data points to the same cluster or assigning two dissimilar data points to different clusters, we score this assignment as successful, otherwise, we score the assignment as unsuccessful.

When solving the correlation clustering problem, two well studied objectives are to minimize the disagreement (the number of unsuccessful assignments) or to maximize the agreement (the number of successful assignments).

In Overlapping Correlation Clustering, we are allowing the clusters to overlap, meaning a single data point may belong to multiple clusters.

In the overlapping case, each data point is assigned to a (non-empty) set of clusters.

We use the Jaccard similarity coefficient (a function that quantifies the similarity between two given sets) to measure how successful a given assignment is.

In this work we study the Overlapping Correlation Clustering problem from a theoretical perspective and focus on the special case of finding an overlapping multiway-cut.

Using a linear relaxation, we obtain an O(e)-additive approximation algorithm, and an O(1e)-multiplicative approximation algorithm, for minimizing the worst disagreement.