TY - GEN
T1 - Near fine grain parallel processing using static scheduling on single chip multiprocessors
AU - Kimura, Keiji
AU - Kasahara, Hironori
N1 - Publisher Copyright:
© 2000 IEEE.
PY - 1999
Y1 - 1999
N2 - With the increase of the number of transistors integrated on a chip, efficient use of transistors and scalable improvement of effective performance of a processor are getting im-portant problems. However, it has been thought that popular superscalar and VLIW would have difficulty to obtain scalable improvement of effective performance in future because of the limitation of instruction level parallelism. To cope with this problem, a single chip multiprocessor (SCM) approach with multi grain parallel processing inside a chip, which hierarchically exploits loop parallelism and coarse grain parallelism among subroutines, loops and basic blocks in addition to instruction level parallelism, is thought one of the most promising approaches. This paper evaluates effectiveness of the single chip multiprocessor architectures with a shared cache, global registers, distributed shared memory and/or local memory for near fine grain parallel processing as the first step of research on SCM architecture to support multi grain parallel processing. The evaluation shows OSCAR (Optimally Scheduled Advanced Multiprocessor) architecture having distributed shared memory and local memory in addition to centralized shared memory and attachment of global register gives us significant speed up such as 13.8% to 143.8% for four pro-cessors compared with shared cache architecture for applications which have been difficult to extract parallelism effectively.
AB - With the increase of the number of transistors integrated on a chip, efficient use of transistors and scalable improvement of effective performance of a processor are getting im-portant problems. However, it has been thought that popular superscalar and VLIW would have difficulty to obtain scalable improvement of effective performance in future because of the limitation of instruction level parallelism. To cope with this problem, a single chip multiprocessor (SCM) approach with multi grain parallel processing inside a chip, which hierarchically exploits loop parallelism and coarse grain parallelism among subroutines, loops and basic blocks in addition to instruction level parallelism, is thought one of the most promising approaches. This paper evaluates effectiveness of the single chip multiprocessor architectures with a shared cache, global registers, distributed shared memory and/or local memory for near fine grain parallel processing as the first step of research on SCM architecture to support multi grain parallel processing. The evaluation shows OSCAR (Optimally Scheduled Advanced Multiprocessor) architecture having distributed shared memory and local memory in addition to centralized shared memory and attachment of global register gives us significant speed up such as 13.8% to 143.8% for four pro-cessors compared with shared cache architecture for applications which have been difficult to extract parallelism effectively.
UR - http://www.scopus.com/inward/record.url?scp=84944409456&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84944409456&partnerID=8YFLogxK
U2 - 10.1109/IWIA.1999.898840
DO - 10.1109/IWIA.1999.898840
M3 - Conference contribution
AN - SCOPUS:84944409456
T3 - Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
SP - 23
EP - 31
BT - Innovative Architecture for Future Generation High-Performance Processors and Systems - 1999 International Workshop on Innovative Architectures, IWIA 1999
A2 - Nakashima, Hiroshi
A2 - Veidenbaum, Alex
PB - IEEE Computer Society
T2 - 1999 International Workshop on Innovative Architectures, IWIA 1999
Y2 - 1 November 1999 through 3 November 1999
ER -