M.Sc Thesis

M.Sc StudentFine Oren
SubjectPacking - Storage Media Workload Reduction via Exploitation
of Object Access Correlation
DepartmentDepartment of Electrical and Computer Engineering
Supervisor ASSOCIATE PROFESSOR Yitzhak Birk


With current disk capacities approaching 1TB, the usable disk capacity in many I/O intensive applications is limited by the disk’s effective bandwidth. The situation is most acute for workloads such as online transaction processing, which are characterized by access to small blocks, as the head-positioning overhead dominates. This is expected to also hold for future MEMS based storage devices. Reducing the required number of independent media accesses for a given workload is therefore of utmost importance. This can be done by “packaging” objects that are accessed in close time proximity: they are placed contiguously on the storage medium, and all are retrieved in a single access whenever any of them are requested. (Subsequent requests to package members are granted from memory.) Packaging thus combines proximal placement with prefetching, neither of which alone would solve the problem.

Our focus in this work is on deciding package composition based on observed access patterns at the storage level, without any prior knowledge. We study the correlations between accesses to different objects from the storage layer perspective, regardless of whether they are otherwise related. Our focus is on workloads with balanced read\write ratio, which makes replication irrelevant (as opposed to prior art that has already proposed packaging, sometimes with different naming). We develop the profitability concept for identifying potential packages according to their effect on the average disk load, as well as a heuristic for choosing among conflicting profitable packages.

Using our own specially crafted packaging utility in conjunction with the DiskSim disk simulator, we have performed simulations on publicly available OLTP traces, in order to put our approach to the test. The results clearly show improvement in both positioning overhead and cache miss rate, with overall performance improvement in total service time of up to 30%, depending on the given trace.