TY - GEN
T1 - Learning in Compressed Domain for Faster Machine Vision Tasks
AU - Liu, Jinming
AU - Sun, Heming
AU - Katto, Jiro
N1 - Funding Information:
ACKNOWLEDGMENT This paper is supported by Kenjiro Takayanagi Foundation, Hoso Bunka Foundation, JST PRESTO under Grant JPMJPR19M5, and JSPS KAKENHI under Grant 21K17770.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on pixel domain, which requires the decoding process. In this paper, we develop a learned compressed domain framework for machine vision tasks. 1) By sending the compressed latent representation directly to the task network, the decoding computation can be eliminated to reduce the complexity. 2) By sorting the latent channels by entropy, only selective channels will be transmitted to the task network, which can reduce the bitrate. As a result, compared with the traditional pixel domain methods, we can reduce about 1/3 multiply-add operations (MACs) and 1/5 inference time while keeping the same accuracy. Moreover, proposed channel selection can contribute to at most 6.8% bitrate saving.
AB - Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on pixel domain, which requires the decoding process. In this paper, we develop a learned compressed domain framework for machine vision tasks. 1) By sending the compressed latent representation directly to the task network, the decoding computation can be eliminated to reduce the complexity. 2) By sorting the latent channels by entropy, only selective channels will be transmitted to the task network, which can reduce the bitrate. As a result, compared with the traditional pixel domain methods, we can reduce about 1/3 multiply-add operations (MACs) and 1/5 inference time while keeping the same accuracy. Moreover, proposed channel selection can contribute to at most 6.8% bitrate saving.
KW - Compressed domain
KW - Face alignment
KW - Image compression
KW - Video coding for machine
UR - http://www.scopus.com/inward/record.url?scp=85125260905&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125260905&partnerID=8YFLogxK
U2 - 10.1109/VCIP53242.2021.9675369
DO - 10.1109/VCIP53242.2021.9675369
M3 - Conference contribution
AN - SCOPUS:85125260905
T3 - 2021 International Conference on Visual Communications and Image Processing, VCIP 2021 - Proceedings
BT - 2021 International Conference on Visual Communications and Image Processing, VCIP 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 International Conference on Visual Communications and Image Processing, VCIP 2021
Y2 - 5 December 2021 through 8 December 2021
ER -