TY - JOUR
T1 - Static Coarse Grain Task Scheduling with Cache Optimization Using OpenMP
AU - Nakano, Hirofumi
AU - Ishizaka, Kazuhisa
AU - Obata, Motoki
AU - Kimura, Keiji
AU - Kasahara, Hironori
N1 - Funding Information:
A part of this research has been supported by METI/NEDO Millennium Project IT21 ‘‘Advanced Parallelizing Compiler’’ and STARC ‘‘Compiler cooperative single chip multiprocessor’’ project.
PY - 2003/6
Y1 - 2003/6
N2 - Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization, The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC Ver. 6 update 1 loop parallelizing compiler.
AB - Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization, The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC Ver. 6 update 1 loop parallelizing compiler.
KW - Cache optimization
KW - Coarse grain task parallelization
KW - OpenMP
KW - Scheduling algorithm
UR - http://www.scopus.com/inward/record.url?scp=0346502797&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0346502797&partnerID=8YFLogxK
U2 - 10.1023/A:1023038702472
DO - 10.1023/A:1023038702472
M3 - Article
AN - SCOPUS:0346502797
SN - 0885-7458
VL - 31
SP - 211
EP - 223
JO - International Journal of Parallel Programming
JF - International Journal of Parallel Programming
IS - 3
ER -