|M.Sc Student||Hagog Mustafa|
|Subject||Optimizing Techniques for Locality of Data Structures|
|Department||Department of Electrical Engineering||Supervisor||Dr. David Bernstein|
Poor data locality is one of the major reasons for inadequate performance in many modern computer programs. The hardware architects, compiler designers, and programmers all have been concerned with this problem, and much research effort has been put into solving it. Programs exhibit poor data locality either because the programmers are unaware of the cache existence or, in many cases, because the designers follow the conceptual organization of the data structure, contradicting the desired layout for better locality of data access. Even if the programmers want to organize their data structure in a cache-aware manner, it is difficult to locate the organization that gives better data locality of data access.
To overcome the data locality problem, the compiler designers use cache-aware optimizations, which improve the data-access locality of these programs. However, the focus of the compiler designers was the scientific code and matrix access. Only in the last decade did compiler designers begin to pay some attention to general-purpose programs, in which the data are grouped into objects and are defined by data structures. One of the main approaches to improving the data-access locality in general purpose programs is to change the memory layout of the data objects.
Our work analyzes programs at the source level by characterizing the accesses to the program structure’s fields using data profiling. It builds a graph that represents the access close proximity relationship between fields of a single data structure (we call it CPG). Then, it performs clustering and reordering algorithms on the CPG and recommends the C language programmers to split or change the order of the fields in the data structure definition. The recommended redefinition of the data structures have better data locality. Our experimental results show improvements of 5-30% in execution time and 10-40% reduction in data-cache miss ratio.