抄録
This paper proposes a data-localization compilation scheme for macro-dataflow computation, in which coarse-grain tasks such as loops, subroutines and basic blocks in a Fortran program are automatically processed in parallel on a multiprocessor system. The data-localization scheme reduces data transfer overhead for passing shared data among coarse-grain tasks composed of Doall loops and sequential loops by using local memory effectively. In this scheme, a compiler partitions coarse-grain tasks, or loops, having data dependences among them into multiple groups by a loop aligned decomposition so that data transfer among groups can be minimum, generates dynamic scheduling routine with partial static task assignment to assign decomposed tasks in a group to the same processor at run-time, and generates parallel machine code to pass shared data inside the group through local memory. A compiler has been implemented for an actual multiprocessor system OSCAR having centralized shared memory and distributed shared memory in addition to local memory on each processor. Performance evaluation on OSCAR shows that macro-dataflow computation with the proposed data-localization scheme can reduce the execution time by 10% to 20% average compared with ordinary macro-dataflow computation using centralized shared memory.
本文言語 | English |
---|---|
ページ | 61-68 |
ページ数 | 8 |
DOI | |
出版ステータス | Published - 1996 |
イベント | Proceedings of the 1996 International Conference on Supercomputing - Philadelphia, PA, USA 継続期間: 1996 5月 25 → 1996 5月 28 |
Other
Other | Proceedings of the 1996 International Conference on Supercomputing |
---|---|
City | Philadelphia, PA, USA |
Period | 96/5/25 → 96/5/28 |
ASJC Scopus subject areas
- コンピュータ サイエンス(全般)