Abstract
This paper proposes a data-localization scheme for macro-dataflow computation in which coarse-grain tasks such as loops, subroutines and basic blocks in a Fortran program are dynamically scheduled onto processors and executed in parallel. The proposed scheme reduces data transfer overhead via centralized shared memory by using local memory effectively for passing shared data among coarse-grain tasks, especially loops. This compilation scheme decomposes multiple loops with data dependences to enable to localize data by loop-aligned-decomposition method, then fuses decomposed loops requiring a large amount of data transfer among them into a macrotask, which is assigned to a processor at run-time. The scheme has been implemented on an actual multiprocessor system OSCAR having centralized shared memory and distributed shared memory in addition to local memory on each processor. Performance evaluation on OSCAR shows that the proposed data-localization scheme can reduce the execution time by 21%.
Original language | English |
---|---|
Pages | 135-140 |
Number of pages | 6 |
Publication status | Published - 1995 |
Event | Proceedings of the 1995 IEEE Pacific RIM Conference on Communications, Computers, and Signal Processing - Victoria, BC, Can Duration: 1995 May 17 → 1995 May 19 |
Other
Other | Proceedings of the 1995 IEEE Pacific RIM Conference on Communications, Computers, and Signal Processing |
---|---|
City | Victoria, BC, Can |
Period | 95/5/17 → 95/5/19 |
ASJC Scopus subject areas
- Signal Processing
- Computer Networks and Communications