M.Sc Thesis

M.Sc StudentGilburd Baruch
SubjectA Privacy Model and Privacy-Preserving Algorithms for Data
Mining in Large-Scale Distributed Systems
DepartmentDepartment of Computer Science
Supervisor PROF. Assaf Schuster


Data privacy is a major threat to the widespread deployment of data grids in domains such as health care and finance. On the one hand, in such large-scale distributed systems, there is a great need for obtaining knowledge, usually by way of a data mining model. Data mining algorithms are a must for efficiently browsing such vast amounts of information. On the other hand, the individual database's privacy must be kept during this process. This calls for developing privacy preserving data mining algorithms which are suitable for large-scale distributed systems.

Privacy preserving data mining algorithms are just one example of a private multiparty computation process. Powerful feasibility results have been shown for this problem, demonstrating that any multi-party probabilistic polynomial-time functionality can be securely computed using a generic algorithm construction. Protocols obtained in this way are, however, inefficient, and thus, useless when a large number of participants are involved.

The contribution of this work is twofold. First, we describe an application of the accepted cryptographic framework that covers such dynamic and incremental execution models. Second, we show the efficacy of our model by employing k-privacy to introduce a novel technique for obtaining knowledge -- by way of an association-rule mining algorithm -- from large-scale data grids, while ensuring that the privacy is cryptographically maintained. The algorithm is asynchronous, involves no global communication patterns, and dynamically adjusts to new data or newly added participants. We also discuss how malicious participants might affect our algorithm.