|M.Sc Student||Friedman Eyal|
|Subject||Processor-to-Memory Non-Equidistant Network in a Many-Core|
|Department||Department of Electrical Engineering||Supervisor||Professor Ran Ginosar|
|Full Thesis text|
The performance of most digital systems today may be limited by the performance of their interconnection between logic and memory, rather than the performance of the logic or memory themselves. As the number of cores increases on the silicon chip, keeping a data cache for each core becomes very expensive in terms of power and performance due to the extreme work load of cache coherency maintenance. A practical approach for many-core processors is to use a shared data memory for all cores. Such an approach requires a complicated cores-to-memory interconnect network that can deliver a very high bandwidth to support all the memory access requests of the cores.
The HyperCore many-core architecture uses a single clock and is based on a shared on-chip memory consisting of many banks, a many-to-many core-to-memory equi-distant interconnection network (memory access times are constant) and a hardware synchronization and scheduling unit. The architecture supports a task-oriented parallel programming model that enhances the traditional serial programming model with simple constructs for handling massively parallel operations. Existing serial algorithms are adapted for parallel execution by identifying code segments which operate repetitively on data. All these repetitions are executed simultaneously on the many parallel cores as one duplicable task, as the scheduler dispatches a single instance of the task to each core. The network enables each core to reach each memory bank, and all memory accesses take the same time.
This research studies a modified HyperCoreX architecture with a non-equidistant network between cores and memory banks, allowing shorter access times to nearer memory banks, by increasing the frequency of the system clock. We also try variations of the number of memory banks in the system. A trace-based architectural simulator was devised to investigate the effects of variable-latency multi-cycle accesses to memory. Six benchmark programs were employed, representing a wide variety of inherent parallelism, address distributions, access rates and data sharing. It was shown that non-equidistant memory in HyperCoreX, together with the increase of frequency by 8, can reduce the average memory access time by up to 61%. The variations in the number of memory banks were proved to be of little effect.