טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentAzriel Leonid
SubjectPeripheral Memory: Analysis of its Impact on Performance of
a General Purpose Computer System
DepartmentDepartment of Electrical Engineering
Supervisors Professor Emeritus Uri Weiser
Professor Avi Mendelson
Full Thesis textFull thesis text - English Version


Abstract

“Memory wall” has always been a major reason for limiting the performance of computer systems. While in the past latency was the major concern, today, lack of bandwidth becomes a limiting factor as well, mainly as a result of increasing the number of cores per die and threads per core, which intensifies the pressure on the memory bandwidth. In such an environment, any additional traffic on memory bus, such as the I/O data traffic may lead to significant degradation of the overall performance of the system.

The first part of this work presents an analysis of I/O traffic impact on the overall performance of current systems. The measurements presented here, were taken from high performance server system as well as from full-system simulator. We demonstrate that I/O data occupies major part of memory bandwidth in certain applications as a result, degrading their performance and increasing the energy waste. We show that CPU power consumption is increased as a result of I/O traffic.

 In the second part of this work, we introduce the concept of Peripheral Memory, a software controlled memory that resides in I/O domain and can efficiently be used for offloading I/O traffic from CPU memory. The Peripheral Memory is designated to take care of ‘I//O Exclusive data’, data that is originated and consumed by I/O domain and not demanded for processing by CPU.  Using the gem5 full-system simulator, we show that in certain applications, I/O Exclusive data can constitute up to 90% of memory bandwidth.

We use the full system simulator to model the Peripheral Memory and show potential performance improvement it can provide for I/O intensive applications. We evaluate different configurations and show that non-coherent split traffic Peripheral Memory configuration is the most efficient for typical I/O intensive application. This configuration can provide speedup of up to 4 times in I/O data transfer rate comparing to the method that current applications use for dealing with I/O data. Finally, we propose an analytical model for calculating contribution of the Peripheral memory to system power consumption. Combining the model with measurements we show that the Peripheral Memory can provide reduction of tens of Watts in system power consumption for I/O intensive workloads.