טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentGuy Ido
SubjectCluster Ranking with an Application to Mining
Mailbox Networks
DepartmentDepartment of Computer Science
Supervisor Mr. Ziv Bar-Yossef
Full Thesis textFull thesis text - English Version


Abstract

We initiate the study of a new clustering framework, called cluster ranking.  Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters - the integrated cohesion which is applicable to arbitrary weighted networks.


We then present C-Rank: a new cluster ranking algorithm. Given a network with arbitrary pairwise similarity weights, C-Rank creates a list of overlapping clusters and ranks them by their integrated cohesion. We provide extensive theoretical and empirical analysis of C-Rank and show that it is likely to have high precision and

recall.


A main component of C-Rank is a heuristic algorithm for finding sparse vertex separators. At the core of this algorithm is a new connection between the well known measure of vertex betweenness and multicommodity flow.


Our experiments focus on mining mailbox networks. A mailbox network is an egocentric social network, consisting of contacts with whom an individual exchanges email. Ties among contacts are represented by the frequency of their co­occurrence on message headers. C-Rank is well suited to mine such networks, since they

are abundant with overlapping communities of highly variable strengths. We demonstrate the effectiveness of C-Rank on the Enron data set, consisting of 130 mailbox networks.