טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentRadinsky Kira
SubjectLearning to Predict the Future Using Web Knowledge and
Dynamics
DepartmentDepartment of Computer Science
Supervisors Professor Shaul Markovitch
Professor Nir Ailon
Full Thesis textFull thesis text - English Version


Abstract

Mark Twain famously said that ``the past does not repeat itself, but it rhymes.'' 

In the spirit of this reflection, we present novel algorithms and methods for leveraging large-scale

digital histories and human knowledge mined from the Web to make real-time predictions about the likelihoods

of future human and natural events of interest.

The Web is a dynamic being, with constantly updating content, which is entangled with sophisticated user behaviors and interactions.

Some of these behaviors have the ability to convey current trends in the present, e.g.,

economical growth (predicting automobile sales based on query volume),

popular movies, and political unrest.

We mine the ever-changing Web content and user Web behavior. We show that, not only the dynamics itself can be predicted, but also that it can be used for

future real-world event prediction.

We mine decades of news reports (1851 -- 2010) from the New York Times (NYT),

and  describe how we can learn to predict the future by generalizing sets of concrete transitions in sequences of reported news events.

In addition to the news corpora, we leverage data from freely available Web resources, including Wikipedia, FreeBase, OpenCyc, and GeoNames, via the LinkedData platform \citep{Bizer:LinkedData:2009}.  The goal is to build predictive models that generalize from specific sets of sequences of events to provide likelihoods of future outcomes, based on patterns of evidence observed in near-term Web activities.  We propose the methods as a means of generating actionable forecasts in advance of the occurrence of target events in the world.  

This thesis is one of the first works to demonstrate general, unrestricted artificial-intelligence prediction capacity.

We present methods derived from heterogeneous Web sources to make knowledge-intensive reasoning about causality and future event prediction, using both automatic feature extraction and novel algorithms for generalizing over historical examples.