טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentKravtsov Valentin
SubjectService-Based Resource Brokering for Grid-Based Applications
DepartmentDepartment of Computer Science
Supervisor Professor Assaf Schuster
Full Thesis textFull thesis text - English Version


Abstract

This work describes research on resource brokering in grid and cloud computing environments. In the first part we describe a system for resource brokering in purely opportunistic grid environments, where the user's applications are “embarrassingly parallel” and require no communication. In the second part we present a complete system for scheduling and coallocation of tightly-coupled jobs in quasi-opportunistic grid environments. In the third part we present a simple, yet powerful, methodology for application-agnostic diagnostic and remediation of performance hot spots in elastic multi-tiered client/server applications, deployed as collections of black box virtual machines.

We begin by describing the DataMiningGrid system and one of its key components - the DataMiningGrid Resource Broker. The DataMiningGrid system has been designed to meet the requirements of modern and distributed data mining scenarios. The DataMiningGrid system provides tools and services to facilitate the grid-enabling of data mining applications without any intervention on the application side. In particular, the DataMiningGrid Resource Broker facilitates the exploitation of different resources from various domains, in order to give data mining researchers the ability to access and utilize resources needed for modern, distributed and computationally intensive data mining algorithms.

The second part of this thesis describes the QosCosGrid (Quasi-Opportunistic Supercomputing for Complex Systems) system and its decision-making module - the QosCosGrid Resource Broker. The QosCosGrid system is the first grid technology with the capability to harness the available grid resources and provide service that is computationally equivalent to a supercomputer. The QosCosGrid system was built to support applications with large numbers of highly interconnected heterogeneous elements. Such applications are used to simulate real-world complex systems that typically exhibit non-linear behavior and emergence. Scheduling of such large-scale, distributed topology-aware applications requires not only that the properties of the requested machines be considered, but also the properties of the machines' interconnections. This requirement severely complicates the scheduling process, as even a matching between a single multi-processors task and available machines in a single time slot becomes an NP-complete problem with no polynomial approximation.

The third part of this thesis depicts a simple, yet powerful, methodology for application-agnostic diagnostic and remediation of performance hot spots in elastic multi-tiered client/server applications, deployed as collections of black box virtual machines. The developed Network Analysis for Remediating Performance Bottlenecks (NAP) system identifies performance bottlenecks that might affect application performance and derives remediation decisions that are most likely to alleviate performance degradation.