M.Sc Thesis

M.Sc StudentSnapy Haim
SubjectEfficient Incorporation of Auxiliary Memory into
a Hieraachical Memory System
DepartmentDepartment of Electrical and Computer Engineering
Supervisor ASSOCIATE PROFESSOR Yitzhak Birk


A computer's performance is mainly determined both by the machine's CPU and memory. However, while a weaker CPU (or less availability of the CPU for a given task) proportionally increases running time, insufficient random access memory results in swapping and may easily cause a 100-fold slowdown. Therefore, ensuring sufficient memory is critical.

In a typical work environment, comprising tens to hundreds of networked PCs and workstations, any given computer may sometimes need more memory than it has. Also, much of the main memory in most computers is unallocated at any given time. Therefore, permitting computers to borrow memory from one another can be practical (memory is available) and extremely rewarding. Note that, unlike with shared memory, only the physical memory is borrowed and no content is shared, so inter-computer memory coherence is not an issue.

Operations with memory on other computers (remote memory) are typically carried out using disk semantics, treating the remote memory as a fast (relative to disk) swap device. This approach is simple. However, the related overhead, which is negligible when dealing with disks, is painful when accessing memory.

In this study we present a design whereby the remote memory is viewed as an extension of the main memory, rather than as a swap device, and is accessed transparently to the application, and at times, even to the operating system. Since the remote memory is accessed through a network interface card (NIC), which is an I/O device, coherence issues arise upon addressing remote memory in a cache-coherent fashion. Our design relies on a novel approach based on OS's existing “Copy on Write” (COW) mechanism and “Dynamic Non-Uniform Memory Architecture (DNUMA) mechanisms for maintaining coherence between the (local) computer's cache and the I/O device through which the remote memory is accessed. The DNUMA mechanism migrates pages between native and auxiliary memory. The COW in conjunction with a DNUMA mechanism first moves any data that needs to be modified to cache-coherent memory, and thus coherence is maintained. The combination of these two mechanisms, which operate in the background, emulates an additional level in the memory hierarchy, but with minimal changes: a separate page table is not required, hardware changes are minimal (permitting Machine Check Architecture bypass), only minor changes to the OS, and no changes to the application's code. This method is estimated to be about 3 times faster than swapping on Store operations and about 6 times faster than swapping on Load operations .