טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentNaveh Alon
SubjectPower Aware Scheduling in a Heterogeneous Processor
DepartmentDepartment of Electrical Engineering
Supervisors Professor Emeritus Uri Weiser
Professor Ran Ginosar
Full Thesis textFull thesis text - English Version


Abstract

Power and thermal budgets are major constraints for delivering compute performance in future client CPUs. Process technology trends indicate transistor density will continue to increase, but so will the overhead of utilizing these to attain additional frequency or extract Instruction Level Parallelism (ILP), impeding power efficient design. Thus, many silicon providers revert to achieving performance via parallel compute, using the increased gate count to place multiple cores on the same die. Selecting the right core for such a multi-core CPU is a tough challenge. Cores optimized for throughput (parallel) compute are smaller and more energy efficient than those designed for latency (single threaded) compute, but have poor single threaded performance. Since typical client applications are actually a mix of single-threaded and multi-threaded segments, previous work proposes using asymmetric multi-core CPUs, with each core subset optimized for a specific “performance-task” or “power/performance level”, thus together addressing both single-threaded and multi-threaded applications well.

This work explores various multiuse-threading performance maximization scheduling policies for a thin form-factor Asymmetric Multi-Core Processor (ASMP), comprising of 2 small and one large core clusters. Previous work has already been done at using asymmetric design for energy efficiency or for improved multi-threading performance at a given area, for high end CPUs. Leveraging a ASMP for performance on a thin form-factor client system brings up a different set of challenges. The power budget available is extremely low to begin with (8-12W in Ultrabooks and tablets and going lower in the future). Accelerators, such as the graphics or media engines, may consume large portions of the total CPU power budget at different execution phases, causing large variations in the budget available for the compute cores. Scheduling policies thus need to be devised to operate well throughout the full power range.

An asymmetric multi-threading power and performance model is introduced, allowing different operating points for the small and large core clusters. Frequency-voltage range is extended to the lower linear range, based on real CPU measurements. A novel scheduling policy is then proposed, based on the individual clusters' performance increase per incremental growth in power (d(Perf)/d(Power)) and that leverages the constant-voltage portion of the frequency curve, that when applied, results in 10-28% performance gain over other policies in the most relevant power range (3-7W) range. The model is tested against code with varying levels of parallelism and is shown to work well even at only 50% parallel code.