M.Sc Thesis | |

M.Sc Student | Kravi Ayelet |
---|---|

Subject | Correlation Clustering With Overlaps |

Department | Department of Computer Science |

Supervisors | ASSOCIATE PROF. Roy Schwartz |

PROF. Joseph Naor | |

Full Thesis text |

The *Overlapping Correlation Clustering* problem has
multiple real-world applications in diverse areas such as: social networks
(community detection), biology (protein analysis) and information retrieval
(categorizing documents).

*Correlation Clustering* is a problem of assigning data
points into distinct groups (clusters) based on similarity.

When assigning two similar data points to the same cluster or assigning two dissimilar data points to different clusters, we score this assignment as successful, otherwise, we score the assignment as unsuccessful.

When solving the correlation clustering problem, two well studied objectives are to minimize the disagreement (the number of unsuccessful assignments) or to maximize the agreement (the number of successful assignments).

In *Overlapping Correlation Clustering*, we are allowing
the clusters to overlap, meaning a single data point may belong to multiple
clusters.

In the overlapping case, each data point is assigned to a (non-empty) set of clusters.

We use the Jaccard similarity coefficient (a function that quantifies the similarity between two given sets) to measure how successful a given assignment is.

In this work we study the *Overlapping Correlation Clustering*
problem from a theoretical perspective and focus on the special case of finding
an overlapping multiway-cut.

Using a linear relaxation, we obtain an *O(**e**)*-additive approximation algorithm, and an *O(1**e**)*-multiplicative approximation algorithm, for
minimizing the worst disagreement.