TY - GEN
T1 - Static coarse grain task scheduling with cache optimization using openMP
AU - Nakano, Hirofumi
AU - Ishizaka, Kazuhisa
AU - Obata, Motoki
AU - Kimura, Keiji
AU - Kasahara, Hironori
PY - 2002
Y1 - 2002
N2 - Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization. The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation, using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC 6 loop parallelizing compiler.
AB - Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization. The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation, using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC 6 loop parallelizing compiler.
UR - http://www.scopus.com/inward/record.url?scp=68749120674&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=68749120674&partnerID=8YFLogxK
U2 - 10.1007/3-540-47847-7_44
DO - 10.1007/3-540-47847-7_44
M3 - Conference contribution
AN - SCOPUS:68749120674
SN - 354043674X
SN - 9783540436744
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 479
EP - 489
BT - High Performance Computing - 4th International Symposium, ISHPC 2002, Proceedings
T2 - 4th International Symposium on High Performance Computing, ISHPC 2002
Y2 - 15 May 2002 through 17 May 2002
ER -