|M.Sc Student||Kama Alon|
|Subject||Transparent Fault-Tolerant Java Virtual Machine|
|Department||Department of Computer Science||Supervisor||Professor Roy Friedman|
Replication is one of the prominent approaches for obtaining fault tolerance. In a distributed environment, where computers are connected by a network, replication can be implemented by having multiple copies of a program run concurrently. In cases where a copy on one of the computers crashes, the others may proceed normally and mask that failure.
Implementing replication as described above on commodity hardware and in a transparent fashion, i.e. without changing the programming model, has many challenges. Deciding at what level (hardware, operating system, middleware, or application) to implement the replication has ramifications on development costs and portability of the programs. Other difficulties lie in the coordination of the copies in the face of non-determinism, such as I/O and environment differences (e.g. different clocks). Also, the minimization of overhead needs to be addressed, so that the performance is acceptable.
We report on an implementation of transparent fault tolerance at the virtual machine level of Java. We describe the design of the system and present performance results that in certain cases are equivalent to those of non-replicated executions. We also discuss design
decisions stemming from implementing replication at the virtual machine level, and the special considerations necessary in order to support Symmetric Multi-Processors (SMP).