TY - GEN
T1 - Multicore Cache Coherence Control by a Parallelizing Compiler
AU - Kasahara, Hironori
AU - Kimura, Keiji
AU - Adhi, Boma A.
AU - Hosokawa, Yuhei
AU - Kishimoto, Yohei
AU - Mase, Masayoshi
N1 - Funding Information:
Masayoshi Mase and Yohei Kishimoto are currently working for Hitachi, Ltd. and Yahoo Japan Corp respectively. Their works contained in this paper were part of their study at Waseda University. Boma Anantasatya Adhi is a staff in Universitas Indonesia and currently a PhD student at Waseda University supported by Hitachi Scholarship.
Publisher Copyright:
© 2017 IEEE.
PY - 2017/9/7
Y1 - 2017/9/7
N2 - A recent development in multicore technology has enabled development of hundreds or thousands core processor. However, on such multicore processor, an efficient hardware cache coherence scheme will become very complex and expensive to develop. This paper proposes a parallelizing compiler directed software coherence scheme for shared memory multicore systems without hardware cache coherence control. The general idea of the proposed method is that an automatic parallelizing compiler analyzes the control dependency and data dependency among coarse grain task in the program. Then based on the obtained information, task parallelization, false sharing detection and data restructuration to prevent false sharing are performed. Next the compiler inserts cache control code to handle stale data problem. The proposed method is built on OSCAR automatic parallelizing compiler and evaluated on Renesas RP2 with 8 SH-4A cores processor. The hardware cache coherence scheme on the RP2 processor is only available for up to 4 cores and the hardware cache coherence can be completely turned off for non-coherence cache mode. Performance evaluation is performed using 10 benchmark program from SPEC2000, SPEC2006, NAS Parallel Benchmark (NPB) and Mediabench II. The proposed method performs as good as or better than hardware cache coherence scheme. For example, 4 cores with the hardware coherence mechanism gave us speed up of 2.52 times against 1 core for SPEC2000 'equake', 2.9 times for SPEC2006 'lbm', 3.34 times for NPB 'cg', and 3.17 times for MediaBench II MPEG2 Encoder. The proposed software cache coherence control gave us 2.63 times for 4 cores and 4.37 for 8 cores for 'equake', 3.28 times for 4 cores and 4.76 times for 8 cores for lbm, 3.71 times for 4 cores and 4.92 times for 8 cores for 'MPEG2 Encoder'.
AB - A recent development in multicore technology has enabled development of hundreds or thousands core processor. However, on such multicore processor, an efficient hardware cache coherence scheme will become very complex and expensive to develop. This paper proposes a parallelizing compiler directed software coherence scheme for shared memory multicore systems without hardware cache coherence control. The general idea of the proposed method is that an automatic parallelizing compiler analyzes the control dependency and data dependency among coarse grain task in the program. Then based on the obtained information, task parallelization, false sharing detection and data restructuration to prevent false sharing are performed. Next the compiler inserts cache control code to handle stale data problem. The proposed method is built on OSCAR automatic parallelizing compiler and evaluated on Renesas RP2 with 8 SH-4A cores processor. The hardware cache coherence scheme on the RP2 processor is only available for up to 4 cores and the hardware cache coherence can be completely turned off for non-coherence cache mode. Performance evaluation is performed using 10 benchmark program from SPEC2000, SPEC2006, NAS Parallel Benchmark (NPB) and Mediabench II. The proposed method performs as good as or better than hardware cache coherence scheme. For example, 4 cores with the hardware coherence mechanism gave us speed up of 2.52 times against 1 core for SPEC2000 'equake', 2.9 times for SPEC2006 'lbm', 3.34 times for NPB 'cg', and 3.17 times for MediaBench II MPEG2 Encoder. The proposed software cache coherence control gave us 2.63 times for 4 cores and 4.37 for 8 cores for 'equake', 3.28 times for 4 cores and 4.76 times for 8 cores for lbm, 3.71 times for 4 cores and 4.92 times for 8 cores for 'MPEG2 Encoder'.
KW - Cache
KW - Multicore
KW - Parallelizing Compiler
KW - Shared Memory
KW - Software Coherence Control
UR - http://www.scopus.com/inward/record.url?scp=85031909144&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85031909144&partnerID=8YFLogxK
U2 - 10.1109/COMPSAC.2017.174
DO - 10.1109/COMPSAC.2017.174
M3 - Conference contribution
AN - SCOPUS:85031909144
T3 - Proceedings - International Computer Software and Applications Conference
SP - 492
EP - 497
BT - Proceedings - 2017 IEEE 41st Annual Computer Software and Applications Conference, COMPSAC 2017
A2 - Demartini, Claudio
A2 - Conte, Thomas
A2 - Nakamura, Motonori
A2 - Lung, Chung-Horng
A2 - Zhang, Zhiyong
A2 - Hasan, Kamrul
A2 - Reisman, Sorel
A2 - Liu, Ling
A2 - Claycomb, William
A2 - Takakura, Hiroki
A2 - Yang, Ji-Jiang
A2 - Tovar, Edmundo
A2 - Cimato, Stelvio
A2 - Ahamed, Sheikh Iqbal
A2 - Akiyama, Toyokazu
PB - IEEE Computer Society
T2 - 41st IEEE Annual Computer Software and Applications Conference, COMPSAC 2017
Y2 - 4 July 2017 through 8 July 2017
ER -