טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentShagin Konstantin
SubjectExecution of Monolithic Java Programs on Large
Non-Dedicated Collections of Commodity
Workstations
DepartmentDepartment of Computer Science
Supervisor Professor Assaf Schuster
Full Thesis textFull thesis text - English Version


Abstract

Interconnected commodity workstations may provide the high processing capacity required to solve many current computational problems. We present a parallel processing framework that allows utilization of available workstations with minimal programming and management efforts. This framework, which we call JavaSplit is a distributed runtime environment for standard multithreaded Java programs. It is resilient to multiple node failures and therefore preserves the integrity of the computation when workstations abruptly terminate their participation. JavaSplit hides the distributed and unreliable nature of the underlying environment from the programmer and can execute preexisting Java programs.


Portability is one of JavaSplit's most notable features, which distinguishes it from the existing distributed runtime environments that provide single system image. Each JavaSplit node carries out its part of the computation using nothing but its local standard Java Virtual Machine (JVM). Hence, the portability of JavaSplit is not only equivalent to that of the Java language, but the runtime can also execute an application on a heterogeneous set of workstations. The portability is realized by instrumenting the program bytecode to enable distributed execution and thread checkpointing. In order to support portable instrumentation of Java system classes (the library classes that are incorporated into the JVM), JavaSplit employs a novel instrumentation method, which we call the Twin Class Hierarchy approach.


Scalability considerations played an important role in the design of our fault-tolerance scheme. As a result, neither failure-free execution nor recovery from a failure requires global cooperation of nodes. Moreover, recovery does not roll back non-failing nodes.


JavaSplit checkpointing capabilities enable employment of speculative locking, an optimistic concurrency control mechanism, which removes the lock acquisition from the  execution critical path and enables concurrent execution of critical sections protected by the same lock. To the best of our knowledge this is the first work to suggest employment of speculative locking in a general-purpose distributed runtime.