M.Sc Student | Mashiach Li-Tal |
---|---|

Subject | Local Approximation of PageRank and Reverse PageRank |

Department | Department of Computer Science |

Supervisor | Mr. Ziv Bar-Yossef |

Full Thesis text |

**
Local PageRank approximation: **The vast majority of algorithms for
computing PageRank have focused on global computation of the PageRank vector.
While in many** **applications of PageRank a global computation is needed,
there are** **situations in which one is interested in computing PageRank
scores** **for just a small subset of the nodes.

**Lower bounds:** We prove that local approximation
of PageRank, even to within modest approximation factors, is infeasible in the
worst-case, as it requires probing the link server for Omega(*n*) nodes,
where *n *is the size of the graph. The difficulty emanates from nodes of
high in-degree and/or from slow convergence of the PageRank random walk.

**Sufficiency:**
We observe that the algorithm of Chen, Gan, and Suel (CIKM 2004) works well for
graphs that have bounded in-degree and admit fast PageRank
convergence: if the PageRank random walk converges on the graph in r
steps and if the maximum in-degree of the graph is d, then the algorithm of
Chen et al. crawls a subgraph of size at most d^r and thus requires at most
this number of queries to the link server. When d and r are small, the
algorithm is efficient. This demonstrates that the two conditions we showed to
be necessary for fast local PageRank approximation are also sufficient.

**PageRank vs. Reverse PageRank:** By analyzing the stanford.edu crawl,
we show that the reverse web graph, like the web graph, admits quick PageRank
convergence (on 80% nodes of the reverse graph, PageRank converged within 20
iterations). We also show that the reverse graph has a bounded in-degree (only
255 as opposed to 38,606 in the regular graph). These findings hint that local
PageRank approximation should be feasible on the reverse graph.

To put this hypothesis to test, we measured the performance of the algorithm of

Chen et al. on a sample of nodes from the stanford.edu graph. We show that for highly ranked nodes the performance of the algorithm on the reverse graph is up to three times better than on the regular graph.

**Applications of Reverse PageRank**: We observe that RPR has a multitude
of applications. It has been used before to select good seeds for the TrustRank
measure, to detect highly influential nodes in social networks, and to find
hubs in the web graph. We present two additional novel applications of RPR: (1)
finding good seeds for crawling; and (2) measuring the “semantic relatedness”
of concepts in a taxonomy.