טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentGoren Gregory
SubjectStrategic Information Retrieval - The Mediator and
Agent Perspectives
DepartmentDepartment of Industrial Engineering and Management
Supervisors Professor Oren Kurland
Professor Moshe Tennenholtz
Full Thesis textFull thesis text - English Version


Abstract

In the Web retrieval setting, many document authors are ``ranking-incentivized''. That is, they are interested in having their documents highly ranked for some queries by search engines. To this end, they often respond to rankings by introducing modifications to their documents (a.k.a., search engine optimization). Hence, the retrieval setting is competitive.


A search engine can be considered as a mediator. It connects users with information needs (which are represented via queries) and document authors (agents) whose pages might satisfy the information needs of the users. This thesis tackles both the mediator perspective and the agent perspective in the competitive retrieval setting.


We start by defining and exploring the robustness of ranking functions to (possibly) adversarial document manipulations that authors apply. We define formal notions of robustness from point-wise and pair-wise points of view. Our analysis establishes a formal connection between the regularization of linear ranking functions and the robustness of rankings they induce. Finally, we present listwise robustness metrics and describe an empirical exploration that provides support for our formal analysis.


Our next line of work is devising automated models for ``white-hat'' search engine optimization. That is, document modifications intended for rank promotion while maintaining high quality content. We introduce a model that takes as input a document and a query for which the document should be promoted in rankings. The method also observes the current and past rankings induced for the query. Then, the method decides which (short) segment (passage) of the document to replace with which passage of another document so that the resultant document is (i) likely to be more highly ranked for the query and (ii) coherent (i.e., of high quality). The merits of the model were demonstrated using both offline and online evaluation.


Our last line of work is exploring the possibility of an herding effect in a competitive search setting. That is to say, the agents are choosing similar strategies without an explicit centralized direction. We explore the potential ability of search engines to shape the content of documents in a corpus in specific, pre-defined, ways via herding. In our work we explored the possibility of using the herding phenomenon to affect documents in terms  of topics discussed, their relevance to information needs, their length and the amount of query terms they contain.