TY - GEN
T1 - Accelerating convolutional neural network inference based on a reconfigurable sliced systolic array
AU - Zeng, Yixuan
AU - Sun, Heming
AU - Katto, Jiro
AU - Fan, Yibo
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 62031009, in part by the Shanghai Science and Technology Committee (STCSM) under Grant 19511104300, in part by Alibaba Innovative Research (AIR) Program, in part by the Innovation Program of Shanghai Municipal Education Commission, in part by the Fudan University-CIOMP Joint Fund (FC2019-001), in part by JST, PRESTO under Grant JPMJPR19M5. Y. Fan and Y. Zeng are with the State Key Laboratory of ASIC and System, Fudan University, China (e-mail: yxzeng18@fudan.edu.cn; fanyibo @fudan.edu.cn).
Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Convolutional neural networks (CNNs) have achieved great successes on many computer vision tasks, such as image recognition, video processing, and target detection. In recent years, many hardware designs have been devoted to accelerating CNN inference. In order to further speed up CNN inference and reduce data waste, this work proposed a reconfigurable sliced systolic array: 1) Depending on the number of network nodes in each layer, the slice mode could be dynamically configured to achieve high throughput and resource utilization. 2) To take full advantage of convolution reuse and weight reuse, this work designed a tile-column sliding (TCS) processing dataflow. 3) A four-stage for loop algorithm was employed, which divides the CNN calculation into several parts based on the input nodes and output nodes. The entire CNN inference is carried out using integer-only arithmetic originated from TensorLite. Experimental results prove that these strategies lead to significant improvement in inference performance and energy efficiency.
AB - Convolutional neural networks (CNNs) have achieved great successes on many computer vision tasks, such as image recognition, video processing, and target detection. In recent years, many hardware designs have been devoted to accelerating CNN inference. In order to further speed up CNN inference and reduce data waste, this work proposed a reconfigurable sliced systolic array: 1) Depending on the number of network nodes in each layer, the slice mode could be dynamically configured to achieve high throughput and resource utilization. 2) To take full advantage of convolution reuse and weight reuse, this work designed a tile-column sliding (TCS) processing dataflow. 3) A four-stage for loop algorithm was employed, which divides the CNN calculation into several parts based on the input nodes and output nodes. The entire CNN inference is carried out using integer-only arithmetic originated from TensorLite. Experimental results prove that these strategies lead to significant improvement in inference performance and energy efficiency.
KW - Convolutional neural network
KW - Deep learning accelerator
KW - Systolic array
UR - http://www.scopus.com/inward/record.url?scp=85109009412&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85109009412&partnerID=8YFLogxK
U2 - 10.1109/ISCAS51556.2021.9401287
DO - 10.1109/ISCAS51556.2021.9401287
M3 - Conference contribution
AN - SCOPUS:85109009412
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
BT - 2021 IEEE International Symposium on Circuits and Systems, ISCAS 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 53rd IEEE International Symposium on Circuits and Systems, ISCAS 2021
Y2 - 22 May 2021 through 28 May 2021
ER -