|M.Sc Student||Lavro Anton|
|Subject||An EDGE Co-Processor for a RISC CPU: Architecture,|
Performance and Power Analysis
|Department||Department of Electrical Engineering||Supervisors||Professor Avi Mendelson|
|Professor Yitzhak Birk|
|Full Thesis text|
The everlasting use of the traditional Reduced Instruction Set Computing (RISC) Instruction Set Architectures (ISAs) (including the μop-based x86) can be identified as one of the key factors impeding the performance scaling of the general purpose Central Processing Units (CPUs). RISC architectures are poorly adapted for out-of-order execution, relying on power-hungry hardware to discover and utilize the Instruction Level Parallelism (ILP), which creates a severe constraint on the architecture’s scalability.
An important alternative to the RISC ISAs is the Explicit Dataflow Graph Execution (EDGE) ISA. In an EDGE ISA, instructions are bundled in large blocks, and within a block the instructions communicate computation results directly among themselves rather than by storing them in the register file or the main memory. The task of discovering the ILP is delegated to the compiler, and at execution time the instruction scheduling is reduced to merely checking for operand availability, thus resolving the scalability constraint.
Such architecture was successfully implemented by a team from the University of Texas in Austin within the TRIPS project. Along with its obvious advantages, TRIPS has some drawbacks that hinder its employment as a general purpose CPU. First, it introduces a new ISA, which is a costly process technologically as well as economically. Second, TRIPS seems very well matched for highly parallel tasks, while its suitability for other tasks is questionable.
These issues can be addressed by using the TRIPS as a co-processor beside a main RISC CPU rather than as a stand-alone engine. This approach was proposed and initially explored by Ishay Geller et. al. Their proposal was to use dynamic binary translation to convert RISC code into EDGE code. Traces of RISC instructions serve as translation units. Geller et. al’s work covered extracting traces from the execution stream, defining a trace predictor and a trace storage structure (cache).
This work continues and expands the scope of the previous work. We develop a complete architecture for the RISC EDGE scheme. Issues such as trace building and translation are addressed, as well as mixing trace and non-trace code, refining the trace predictor, memory ordering and others. Detailed performance and power analysis of the proposed architecture is performed, while identifying its advantages and disadvantages relative to an out-of-order RISC machine. The principal conclusion is that the RISC EDGE paradigm is best suited for the multi-thread domain of computing with substantial performance improvement potentials, while the power consumption is subject for future optimization.