TY - GEN
T1 - Contextual information based network with high-frequency feature fusion for high frame rate and ultra-low delay small-scale object detection
AU - Huang, Dongmei
AU - Zhang, Jihan
AU - Hu, Tingting
AU - Fuchikami, Ryuji
AU - Ikenaga, Takashi
N1 - Funding Information:
This work was supported by KAKENHI (21K11816).
Publisher Copyright:
© 2021 MVA Organization.
PY - 2021/7/25
Y1 - 2021/7/25
N2 - High frame rate and ultra-low delay small-scale object detection plays an important role in factory automation for its timely and accurate reaction. Although many CNN based detection methods have been proposed to improve the accuracy of small object detection for the low resolution and large gap between the object and the background, it is difficult to achieve a trade-off between accuracy and speed. For the pursuit of ultra-low delay processing by utilizing FPGA, this paper proposes: (A) IoU and distance based loss function, (B) Contextual information with high temporal correlation based parallel detection, (C) High frequency feature fusion for enhancing low-bit networks. The proposed methods achieve 45.3 % mAP for test sequences, which is only 0.7 % mAP lower compared with the general method. Meanwhile, the size of the model has been compressed to 1.94 % of the original size and reaches a speed of 278 fPs on FPGA and 15 fPs on GPU.
AB - High frame rate and ultra-low delay small-scale object detection plays an important role in factory automation for its timely and accurate reaction. Although many CNN based detection methods have been proposed to improve the accuracy of small object detection for the low resolution and large gap between the object and the background, it is difficult to achieve a trade-off between accuracy and speed. For the pursuit of ultra-low delay processing by utilizing FPGA, this paper proposes: (A) IoU and distance based loss function, (B) Contextual information with high temporal correlation based parallel detection, (C) High frequency feature fusion for enhancing low-bit networks. The proposed methods achieve 45.3 % mAP for test sequences, which is only 0.7 % mAP lower compared with the general method. Meanwhile, the size of the model has been compressed to 1.94 % of the original size and reaches a speed of 278 fPs on FPGA and 15 fPs on GPU.
UR - http://www.scopus.com/inward/record.url?scp=85113997027&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113997027&partnerID=8YFLogxK
U2 - 10.23919/MVA51890.2021.9511387
DO - 10.23919/MVA51890.2021.9511387
M3 - Conference contribution
AN - SCOPUS:85113997027
T3 - Proceedings of MVA 2021 - 17th International Conference on Machine Vision Applications
BT - Proceedings of MVA 2021 - 17th International Conference on Machine Vision Applications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th International Conference on Machine Vision Applications, MVA 2021
Y2 - 25 July 2021 through 27 July 2021
ER -