|M.Sc Student||Grabovitch-Zuyev Irena|
|Subject||Entity search in Facebook|
|Department||Department of Computer Science||Supervisors||Mr. Ziv Bar-Yossef|
|Dr. Yaron Kanza|
|Full Thesis text|
Facebook is the most popular online social network, with more than a billion and a half active users. As such, Facebook has become a significant source of information and searching within its mass content has become a necessity. Nevertheless, very little research has been done on Facebook Search, due to the difficulty of collecting Facebook data by researchers who are not Facebook employees.
In this research we study entity search in Facebook. Queries are public entities, like celebrities, companies, places, organizations, movies, and books. The entities are represented as Facebook pages. Search results (documents) are content items (posts, checkins, status updates, shares, photos, groups) that are accessible to a particular Facebook user and are relevant to her query. Beyond directly addressing the user’s need to search her content in Facebook, entity search can be useful for recommending interesting content to the user and generating a user model that can lead to better personalized services and ad targeting.
Searching within Facebook content is difficult because documents are short and rife with slang and other social network jargon. Our search algorithm tackles this challenge by using a rich representation of each query entity, including aliases in various languages and related entities and terms. This, together with aggressive stemming, allows our algorithm to retrieve even short and informal documents that refer to the entity by various nicknames in different languages. The rich entity representation is also used to score documents based on their similarity to the entity’s related terms and discard ones that either refer to other entities with the same name or that only marginally refer to the entity.
In order to generate a rich entity representation, we reconcile Facebook pages with Freebase entities and use the content of both the Facebook pages and the corresponding Freebase entries to come up with aliases, names in different languages and related terms. This reconciliation module could be of independent interest.
We evaluated our search algorithm on content collected from 6 Facebook accounts, covering items posted by 1,000 Facebook users. For almost all categories of entities, our algorithm achieves 88% precision and 75% recall of highly relevant items and 66% recall when including weakly relevant items. We compared our algorithm to a baseline method that achieves only 77% precision and a significantly lower recall of about 50%.