טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentNimrod Raifer
SubjectInformation Retrieval Meets Game Theory: The Ranking
Competition Between Documents Authors
DepartmentDepartment of Industrial Engineering and Management
Supervisors Full Professor Kurland Oren
Full Professor Tennenholtz Moshe
Full Thesis textFull thesis text - English Version


Abstract

In competitive search settings as the Web, there is an ongoing ranking competition between document authors (publishers) for certain queries. The goal is to have documents highly ranked, and the means is document manipulation applied in response to rankings. Existing retrieval models, and their theoretical underpinnings (e.g., the probability ranking principle), do not account for post-ranking corpus dynamics driven by this strategic behavior of publishers. However, the dynamics has major eect on retrieval eectiveness due to the availability of content in the corpus. Furthermore, while manipulation strategies observed over the Web were reported in past literature, they were not analyzed as on-going, and changing, post-ranking response strategies, nor were they connected to the foundations of classical ad hoc retrieval models (e.g., content-based document-query surface level similarities and document relevance priors). We present a novel theoretical and empirical analysis of the strategic behavior of publishers using these foundations. Empirical analysis of controlled ranking competitions that we organized reveals a key strategy of publishers: making their documents (gradually) become similar to documents ranked the highest in previous rankings. Our theoretical analysis of the ranking competition as a repeated game, and its minmax regret equilibrium, yields a result that supports the merits of this publishing strategy. We further show that it can be predicted with high accuracy, and without explicit knowledge of the ranking function, whether documents will be promoted to the highest rank in our competitions. The prediction utilizes very few features which quantify changes of documents, specifically with respect to those previously ranked the highest.