|M.Sc Thesis||Department of Electrical Engineering|
|Supervisors:||Dr. Mendelson Abraham|
|Assoc. Prof. Kolodny Avinoam|
|Full Thesis text|
The ever increasing performance gap between processor speed and memory access time urges the need to improve prefetching techniques. A good prefetch mechanism needs to be accurate and timely: bring to the cache only data which will be needed and make sure it arrives just in time. So far, most of the hardware based prefetching mechanisms have been proven to be effective for handling relatively short latency gaps; e.g., L1 cache misses which hit L2, but less effective in covering long latencies such as accesses to main memory. Previous work proposed the Dead Block Correlation Prefetcher, which was shown to be effective in hiding long memory accesses but has several disadvantages, like the need for extensively large amount of storage for internal book keeping data structures in order to be effective - 120MB in the worst case in our simulation environment. This work presents a new approach for a hardware prefetcher, where a stride-based mechanism is combined with the dead-block predictor to achieve high accuracy with early enough trigger time. Using cycle accurate simulations of our proposed prefetcher, we were able to show 5% performance improvement over traditional hardware based prefetchers (100% improvement in the best case), using a reasonable amount of storage - 8KB in the simplest form. Using 256KB of storage in a less aggressive implementation achieves an improvement of 7%. These improvements become more substantial (13% and 19%, respectively) as the memory access time increases. The approach presented in this work can be implemented with other well established prefetching algorithms, addressing different access patterns present in different applications, in order to achieve a long prefetching lookahead with any such access pattern.