M.Sc Thesis

M.Sc StudentReiner Sarit
SubjectScheduling of Data Transcription in Periodically
Connected Databases
DepartmentDepartment of Industrial Engineering and Management
Supervisor PROF. Avigdor Gal


Contemporary data management in applications such as pervasive systems involves a periodic transcription of data onto secondary devices in a networked environment.  Using continuous communication holds high cost; it is unreliable; may cause server overload and sometimes infeasible during certain periods of time.  Such an environment may require periodic (rather than continuous) synchronization between the database and secondary copies, either due to paucity of resources (e.g. low bandwidth or limited time windows) or the transient characteristics of the connection. Hence the consistency of the information in secondary copies, with respect to the transcription origin, varies over time and depends on the rate of change of the base data and on the frequency of synchronization.  Our approach to evaluating the tradeoff between the transcription cost and the cost of obsolescence is to use modeling techniques from the field of stochastic processes.  This work presents a general model for data insertions on the server side, using compound nonhomogenous Poisson processes, and compares several transcription policies in terms of both transcription cost and obsolescence cost.  As part of this work we have developed an optimization algorithm, which is a simple coordinate-descent (or "hill-climbing") procedure for locally improving any continuous-time schedule with respect to the insertion model.  The optimization model can be combined with transcription policies results.  The principal novel contributions of this work are twofold.  The first is developing and testing a set of transcription policies.  The second is developing and testing an optimization algorithm which decreases transcription cost for all tested policies.  Specifically, these algorithms work explicitly and directly with the insertion model that includes both non-constant update intensity at the server and non-uniform time importance at the client.  Prior algorithms in this domain have lacked such a cost model, or did not make full use of it.