TY - GEN
T1 - Sparseness Ratio Allocation and Neuron Re-pruning for Neural Networks Compression
AU - Guo, Li
AU - Zhou, Dajiang
AU - Zhou, Jinjia
AU - Kimura, Shinji
N1 - Funding Information:
ACKNOWLEDGMENT This work is supported by JST, PRESTO Grant Number JPMJPR1757 Japan, by Waseda Univ. Graduate Program for Embodiment Informatics (FY2013-FY2019), and by Research Fellowships of Japan Society for the Promotion of Science for Young Scientists.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/4/26
Y1 - 2018/4/26
N2 - Convolutional neural networks (CNNs) are rapidly gaining popularity in artificial intelligence applications and employed in mobile devices. However, this is challenging because of the high computational complexity of CNNs and the limited hardware resource in mobile devices. To address this issue, compressing the CNN model is an efficient solution. This work presents a new framework of model compression, with the sparseness ratio allocation (SRA) and the neuron re-pruning (NRP). To achieve a higher overall spareness ratio, SRA is exploited to determine pruned weight percentage for each layer. NRP is performed after the usual weight pruning to further reduce the relative redundant neurons in the meanwhile of guaranteeing the accuracy. From experimental results, with a slight accuracy drop of 0.1%, the proposed framework achieves 149.3× compression on lenet-5. The storage size can be reduced by about 50% relative to previous works. 8-45.2% computational energy and 11.5-48.2% memory traffic energy are saved.
AB - Convolutional neural networks (CNNs) are rapidly gaining popularity in artificial intelligence applications and employed in mobile devices. However, this is challenging because of the high computational complexity of CNNs and the limited hardware resource in mobile devices. To address this issue, compressing the CNN model is an efficient solution. This work presents a new framework of model compression, with the sparseness ratio allocation (SRA) and the neuron re-pruning (NRP). To achieve a higher overall spareness ratio, SRA is exploited to determine pruned weight percentage for each layer. NRP is performed after the usual weight pruning to further reduce the relative redundant neurons in the meanwhile of guaranteeing the accuracy. From experimental results, with a slight accuracy drop of 0.1%, the proposed framework achieves 149.3× compression on lenet-5. The storage size can be reduced by about 50% relative to previous works. 8-45.2% computational energy and 11.5-48.2% memory traffic energy are saved.
KW - Model compression
KW - connection/neuron pruning
KW - neuron re-pruning
KW - sparseness ratio allocation
UR - http://www.scopus.com/inward/record.url?scp=85057134942&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057134942&partnerID=8YFLogxK
U2 - 10.1109/ISCAS.2018.8351094
DO - 10.1109/ISCAS.2018.8351094
M3 - Conference contribution
AN - SCOPUS:85057134942
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
BT - 2018 IEEE International Symposium on Circuits and Systems, ISCAS 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Symposium on Circuits and Systems, ISCAS 2018
Y2 - 27 May 2018 through 30 May 2018
ER -