TY - GEN
T1 - Cascaded DMA controller for speedup of indirect memory access in irregular applications
AU - Kashimata, Tomoya
AU - Kitamura, Toshiaki
AU - Kimura, Keiji
AU - Kasahara, Hironori
N1 - Funding Information:
Part of this paper is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO) and JSPS KAKENHI Grant Number JP18K19786.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - Indirect memory accesses caused by sparse linear algebra calculations are widely used in important real applications. However, they also cause serious inefficient memory accesses and pipeline stalls resulting low execution efficiency even with high memory bandwidth and much computational resource. One of the important issues of indirect memory accesses, such as accessing A[B[i]], is it requires two succeeding different memory accesses: the index loads (B[i]) and the following data element accesses (A[B[i]]). To overcome this situation, we propose the Cascaded-DMAC (CDMAC). This CDMAC is intended to be attached in each core of a multicore chip in addition to a CPU core, a vector accelerator, and a local data memory. It performs data transfers between an off-chip main memory and an in-core local data memory, which provides data to the accelerator. The key idea of the CDMAC is cascading two DMACs so that the first one loads indices, then the second one accesses data elements by using these indices. Thus, this organization realizes the autonomous indirect memory accesses by giving an index array and an element array, and obtains the efficient SIMD computations by lining up the sparse data into the local data memory. We implemented a multicore processor having the proposed CDMAC on an FPGA board. The evaluation result of sparse matrix-vector multiplications on the FPGA shows that the CDMAC achieves 17x speedup at most compared with the CPU data transfer.
AB - Indirect memory accesses caused by sparse linear algebra calculations are widely used in important real applications. However, they also cause serious inefficient memory accesses and pipeline stalls resulting low execution efficiency even with high memory bandwidth and much computational resource. One of the important issues of indirect memory accesses, such as accessing A[B[i]], is it requires two succeeding different memory accesses: the index loads (B[i]) and the following data element accesses (A[B[i]]). To overcome this situation, we propose the Cascaded-DMAC (CDMAC). This CDMAC is intended to be attached in each core of a multicore chip in addition to a CPU core, a vector accelerator, and a local data memory. It performs data transfers between an off-chip main memory and an in-core local data memory, which provides data to the accelerator. The key idea of the CDMAC is cascading two DMACs so that the first one loads indices, then the second one accesses data elements by using these indices. Thus, this organization realizes the autonomous indirect memory accesses by giving an index array and an element array, and obtains the efficient SIMD computations by lining up the sparse data into the local data memory. We implemented a multicore processor having the proposed CDMAC on an FPGA board. The evaluation result of sparse matrix-vector multiplications on the FPGA shows that the CDMAC achieves 17x speedup at most compared with the CPU data transfer.
KW - CDMAC
KW - Cascaded DMA Controller
KW - DMA
KW - DMAC
KW - Indirect memory access
KW - SMVM
KW - SpMV
KW - Sparse matrix vector multiplication
UR - http://www.scopus.com/inward/record.url?scp=85078215116&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078215116&partnerID=8YFLogxK
U2 - 10.1109/IA349570.2019.00017
DO - 10.1109/IA349570.2019.00017
M3 - Conference contribution
AN - SCOPUS:85078215116
T3 - 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms, IA3 2019
SP - 71
EP - 76
BT - 2019 IEEE/ACM 9th Workshop on Irregular Applications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, IA3 2019
Y2 - 18 November 2019
ER -