TY - GEN
T1 - Pseudo Dual Path Processing to reduce the branch misprediction penalty in embedded processors
AU - Zhao, Huatao
AU - Ye, Jiongyao
AU - Sun, Yuxin
AU - Watanabe, Takahiro
PY - 2013/1/1
Y1 - 2013/1/1
N2 - In modern embedded processors, a superscalar technique and a deep pipeline architecture are widely used to achieve higher performance, but the branch misprediction penalty is acting as a significant constraint on system performance. As the depth of pipeline increases, re-filling the pipeline plays a key role causing the branch misprediction penalty. In this paper, we propose a new mechanism, named Pseudo Dual Path Processing (PDPP), to reduce the branch misprediction penalty. The mechanism uses a small trace cache to store a set of successive decoded instructions and related renaming information from the alternative path, so that those instructions can skip the fetch and decode stages on a trace cache hit, and the renaming process for all instructions from the alternative path can be executed in one cycle by using the renaming information stored in advance. Therefore, PDPP nearly does not reduce the effective bandwidth of the front-end stages during processing instructions from two paths, but reduces the re-fill penalty without increasing the design complexity and the power consumption. In addition, a critical path prediction is employed to improve the efficiency of the PDPP by preventing the non-critical branches from being forked. The experimental results show that the proposed PDPP improves the IPC by 7.43%, compared to a conventional processor.
AB - In modern embedded processors, a superscalar technique and a deep pipeline architecture are widely used to achieve higher performance, but the branch misprediction penalty is acting as a significant constraint on system performance. As the depth of pipeline increases, re-filling the pipeline plays a key role causing the branch misprediction penalty. In this paper, we propose a new mechanism, named Pseudo Dual Path Processing (PDPP), to reduce the branch misprediction penalty. The mechanism uses a small trace cache to store a set of successive decoded instructions and related renaming information from the alternative path, so that those instructions can skip the fetch and decode stages on a trace cache hit, and the renaming process for all instructions from the alternative path can be executed in one cycle by using the renaming information stored in advance. Therefore, PDPP nearly does not reduce the effective bandwidth of the front-end stages during processing instructions from two paths, but reduces the re-fill penalty without increasing the design complexity and the power consumption. In addition, a critical path prediction is employed to improve the efficiency of the PDPP by preventing the non-critical branches from being forked. The experimental results show that the proposed PDPP improves the IPC by 7.43%, compared to a conventional processor.
UR - http://www.scopus.com/inward/record.url?scp=84901336245&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84901336245&partnerID=8YFLogxK
U2 - 10.1109/ASICON.2013.6811990
DO - 10.1109/ASICON.2013.6811990
M3 - Conference contribution
AN - SCOPUS:84901336245
SN - 9781467364157
T3 - Proceedings of International Conference on ASIC
BT - 2013 IEEE 10th International Conference on ASIC, ASICON 2013
PB - IEEE Computer Society
T2 - 2013 IEEE 10th International Conference on ASIC, ASICON 2013
Y2 - 28 October 2013 through 31 October 2013
ER -