Propose novel power-aware CPU architectures, implement & evaluate them using circuit/software infrastructures.
Low-Power Set-Associative L1 Instruction Cache
• Proposed early tag lookup technique to reduce dynamic read energy of set-associative L1 instruction caches.
• Redesigned the instruction cache, BTB, branch predictor, and the instruction fetch stage of the experimental superscalar processor to support early tag lookup.
• Evaluated the new processor’s performance, the overhead of the proposed technique, and the area, access time, and read/write energy of the new instruction cache.
Dynamic Core Scaling for Performance and Energy Trade-Off
• Proposed dynamic core scaling that scales pipeline resources of superscalar processors, including front-end width, issue width, and sizes of issue queue, load/store queue, and ROB, to trade-off performance and energy.
• Implemented dynamic core scaling on FabScalar generated RTL superscalar core by modifying various pipeline stages including fetch, issue, memory, and retire.
• Implemented a store-set memory dependence predictor, various two-level branch predictors, and an LSU that is able to process multiple loads/stores per cycle on the RTL processor.
• Performed clock gating, synthesized the new reconfigurable processor, did timing and power analysis based on circuit implementation, evaluated performance and energy using SPEC benchmarks.
Adaptive Front-End Throttling for Superscalar Processors
• Proposed adaptive front-end throttling technique that dynamically adjusts the instruction delivery bandwidth of wide-issue superscalar processors to improve energy efficiency.
• Implemented the proposed technique on FabScalar generated RTL superscalar core by modifying the core’s fetch, decode, rename, dispatch, issue, memory, and retire pipeline stages.
• Designed a two-level non-blocking cache, implemented it in RTL code, and integrated it with FabScalar core.