TY - GEN
T1 - A low-power misprediction recovery mechanism
AU - Ye, Jiongyao
AU - Watanabe, Takahiro
PY - 2009
Y1 - 2009
N2 - In modern superscalar processor, branch misprediction penalty becomes a critical factor in overall processor performance. Previous researches proposed dual (or multi) path execution methods attempt to reduce the misprediction penalty, but these methods are quite complex and high power consumption. Most of the reasons are due to simultaneously fetching and executing instructions from multiple. In this paper, we reduce branch misprediction penalties based on the balance between complexity, power, and performance. We present a novel technique - Decode Recovery Cache (DRC) - for reducing misprediction penalty, giving consideration to complexity and power consumption simultaneously. The DRC stores decoded instructions that are mispredicted. Then during subsequent mispredictions, a hit in the DRC can reduce the re-fill time of pipeline, and eliminate instruction re-fetch and its subsequent decoding. The bypassing of both re-fetching and re-decoding reduces processor power. Experimental results employing SPECint 2000 benchmark show that, using a processor with DRC, IPC value is significantly improved by 10.4% on average over the traditional processors and average power consumption is reduced by 62.6%, compared with dual Path Instruction Processing.
AB - In modern superscalar processor, branch misprediction penalty becomes a critical factor in overall processor performance. Previous researches proposed dual (or multi) path execution methods attempt to reduce the misprediction penalty, but these methods are quite complex and high power consumption. Most of the reasons are due to simultaneously fetching and executing instructions from multiple. In this paper, we reduce branch misprediction penalties based on the balance between complexity, power, and performance. We present a novel technique - Decode Recovery Cache (DRC) - for reducing misprediction penalty, giving consideration to complexity and power consumption simultaneously. The DRC stores decoded instructions that are mispredicted. Then during subsequent mispredictions, a hit in the DRC can reduce the re-fill time of pipeline, and eliminate instruction re-fetch and its subsequent decoding. The bypassing of both re-fetching and re-decoding reduces processor power. Experimental results employing SPECint 2000 benchmark show that, using a processor with DRC, IPC value is significantly improved by 10.4% on average over the traditional processors and average power consumption is reduced by 62.6%, compared with dual Path Instruction Processing.
UR - http://www.scopus.com/inward/record.url?scp=77949628805&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77949628805&partnerID=8YFLogxK
U2 - 10.1109/PRIMEASIA.2009.5397409
DO - 10.1109/PRIMEASIA.2009.5397409
M3 - Conference contribution
AN - SCOPUS:77949628805
SN - 9781424446698
T3 - 1st Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics, PrimeAsia 2009
SP - 209
EP - 212
BT - 1st Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics, PrimeAsia 2009
T2 - 1st Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics, PrimeAsia 2009
Y2 - 19 November 2009 through 21 November 2009
ER -