TY - GEN
T1 - Quad-multiplier packing based on customized floating point for convolutional neural networks on FPGA
AU - Zhang, Zhifeng
AU - Zhou, Dajiang
AU - Wang, Shihao
AU - Kimura, Shinji
N1 - Funding Information:
We would like to thank to members of Kimura-lab in Waseda University for their discussions. The work is supported in part by the fund from NEC.
Publisher Copyright:
© 2018 IEEE.
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2018/2/20
Y1 - 2018/2/20
N2 - Deep convolutional neural networks (CNNs) are widely used in many computer vision tasks. Since CNNs involve billions of computations, it is critical to reduce the resource/power consumption and improve parallelism. Compared with extensive researches on fixed point conversion for cost reduction, floating point customization has not been paid enough attention due to its higher cost than fixed point. This paper explores the customized floating point for both the training and inference of CNNs. 9-bit customized floating point is found sufficient for the training of ResNet-20 on CIFAR-10 dataset with less than 1% accuracy loss, which can also be applied to the inference of CNNs. With reduced bit-width, a computational unit (CU) based on Quad-Multiplier Packing is proposed to improve the resource efficiency of CNNs on FPGA. This design can save 87.5% DSP slices and 62.5% LUTs on Xilinx Kintex-7 platform compared to CU using 32-bit floating point. More CUs can be arranged on FPGA and higher throughput can be expected accordingly.
AB - Deep convolutional neural networks (CNNs) are widely used in many computer vision tasks. Since CNNs involve billions of computations, it is critical to reduce the resource/power consumption and improve parallelism. Compared with extensive researches on fixed point conversion for cost reduction, floating point customization has not been paid enough attention due to its higher cost than fixed point. This paper explores the customized floating point for both the training and inference of CNNs. 9-bit customized floating point is found sufficient for the training of ResNet-20 on CIFAR-10 dataset with less than 1% accuracy loss, which can also be applied to the inference of CNNs. With reduced bit-width, a computational unit (CU) based on Quad-Multiplier Packing is proposed to improve the resource efficiency of CNNs on FPGA. This design can save 87.5% DSP slices and 62.5% LUTs on Xilinx Kintex-7 platform compared to CU using 32-bit floating point. More CUs can be arranged on FPGA and higher throughput can be expected accordingly.
UR - http://www.scopus.com/inward/record.url?scp=85045323738&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045323738&partnerID=8YFLogxK
U2 - 10.1109/ASPDAC.2018.8297303
DO - 10.1109/ASPDAC.2018.8297303
M3 - Conference contribution
AN - SCOPUS:85045323738
T3 - Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
SP - 184
EP - 189
BT - ASP-DAC 2018 - 23rd Asia and South Pacific Design Automation Conference, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd Asia and South Pacific Design Automation Conference, ASP-DAC 2018
Y2 - 22 January 2018 through 25 January 2018
ER -