M.Sc Thesis

M.Sc StudentJammal Mtanes Loren
SubjectAnalyzing the Interaction of I/O and Performance in
DepartmentDepartment of Electrical and Computer Engineering
Supervisor PROFESSOR EMERITUS Uri Weiser


In modern computer architectures, where many cores share the same storage resource on the same chip (Chip Multi Processors-CMP), off-chip bandwidth is a potential bottleneck. In multi-core machines many processes run in parallel as a result the demand of storage bandwidth increases. This problem is particularly severe in the presence of big data, where the storage is accessed frequently. When the storage is shared among many CPUs and off-chip bandwidth is insufficient, pressure on the I/O bus increases, hence decreasing performance.

Solid-state drives (SSDs) were developed to compensate for the seek time of hard disk drives (HDDs), leading to performance improvement. Within the framework of SSDs, the access time is decreased, yet it is not fast enough to serve modern multiprocessor systems where many processes need to access the storage simultaneously.

Our target in this work is to understand the impact of the number of running processes on performance in multiprocessor systems for programs that consist of both I/O and computational phases. Reading a lot of data and making computations are a common behavior for a wide range of applications such as extracting fields from databases and counting words in text files. The behaviors of such applications are investigated by addressing the interplay between the maximum bandwidth limit of the storage, the number of processes running in parallel, and the buffer size of the read system call. Understanding this interplay assists in setting the parameters for improving the execution time and storage bandwidth consumption. To this end, we developed an analytical model for estimating the execution time and the consumed storage bandwidth. This model (backed by simulations) can be used to study the workload parameters and its performance properties. Moreover, the model also suits extreme cases of programs such as negligible I/O programs as well as negligible computation programs. The model depends on both program and architecture parameters. The former parameters are the number of running parallel processes, the I/O buffer size, and the number of computations, while the latter parameters are the frequency, the number of executed instructions per second, and the storage bandwidth limit. Our goal is to use this model to optimize the performance by determining the appropriate number of running processes and the buffer size while keeping the architectural resources balanced.