טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentBar Pavel
SubjectResource Management in Grid Environments
DepartmentDepartment of Computer Science
Supervisor Professor Assaf Schuster
Full Thesis textFull thesis text - English Version


Abstract

Grid computing environments have become mission-critical components in research and industry, offering sophisticated solutions to exploit large computing and storage resources across multiple geographic locations and administrative domains. Usually, such grid resources are non-dedicated or opportunistic; as a consequence users will utilize the resources following a “best effort'” approach. However, many real-world supercomputing applications, such as computational fluid dynamics, weather forecasting, and complex system simulations, rely on coallocation of large numbers of reliable resources as well as on a static and stable execution environment. For such applications the “best effort” quality of service provided by conventional opportunistic grids is inadequate.


The research in this area has resulted in the new concept of quasi-opportunistic supercomputing which enables the execution of demanding parallel applications on a very large number of non-dedicated resources in grid environments. However, such quasi-opportunistic supercomputing system require a very sophisticated resource management system that will support advance reservation of resources, job scheduling, and non-trivial topology-aware coallocation of resource requests to available resources. Moreover, scheduling of large-scale, distributed topology-aware applications requires that not only the properties of the requested machines be considered, but also the properties of the machines' interconnections. This requirement severely complicates the scheduling process, as even a matching between a single multi-processor task and available machines in a single time slot becomes an NP-complete problem with no polynomial approximation.


In this work we propose a complete scheduling framework for multi-cluster, heterogeneous environments that provides, in practice, an efficient solution for the scheduling and coallocation of topology-aware applications. The proposed framework is very flexible as it is composed of pluggable components and can be easily configured to support a variety of scheduling policies. We also describe three novel scheduling and coallocation algorithms that were developed and plugged into the framework. The proposed scheduling framework was integrated into the QosCosGrid system, where it is used as the main decision-making module.