TY - GEN
T1 - STREAMING-CAPABLE HIGH-PERFORMANCE ARCHITECTURE OF LEARNED IMAGE COMPRESSION CODECS
AU - Lin, Fangzheng
AU - Sun, Heming
AU - Katto, Jiro
N1 - Funding Information:
This work was supported in part by NICT, Grant Number 03801, Japan; in part by JST, PRESTO Grant Number JPMJPR19M5, Japan; in part by Japan Society for the Promotion of Science (JSPS), under Grant 21K17770; andinaprt byKenjiroTakayanagiFoundation.
Funding Information:
This work was supported in part by NICT, Grant Number 03801, Japan; in part by JST, PRESTO Grant Number JPMJPR19M5, Japan; in part by Japan Society for the Promotion of Science (JSPS), under Grant 21K17770; and in part by Kenjiro Takayanagi Foundation.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs focused more on the neural model and entropy coding, we present an alternative method to improving the runtime performance of various learned image compression models. We introduce multi-threaded pipelining and an optimized memory model to enable GPU and CPU workloads' asynchronous execution, fully taking advantage of computational resources. Our architecture alone already produces excellent performance without any change to the neural model itself. We also demonstrate that combining our architecture with previous tweaks to the neural models can further improve runtime performance. We show that our implementations excel in throughput and latency compared to the baseline and demonstrate the performance of our implementations by creating a real-time video streaming encoder-decoder sample application, with the encoder running on an embedded device.
AB - Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs focused more on the neural model and entropy coding, we present an alternative method to improving the runtime performance of various learned image compression models. We introduce multi-threaded pipelining and an optimized memory model to enable GPU and CPU workloads' asynchronous execution, fully taking advantage of computational resources. Our architecture alone already produces excellent performance without any change to the neural model itself. We also demonstrate that combining our architecture with previous tweaks to the neural models can further improve runtime performance. We show that our implementations excel in throughput and latency compared to the baseline and demonstrate the performance of our implementations by creating a real-time video streaming encoder-decoder sample application, with the encoder running on an embedded device.
KW - high-performance
KW - learned image compression
KW - pipelining
KW - real-time streaming
UR - http://www.scopus.com/inward/record.url?scp=85146735645&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146735645&partnerID=8YFLogxK
U2 - 10.1109/ICIP46576.2022.9897695
DO - 10.1109/ICIP46576.2022.9897695
M3 - Conference contribution
AN - SCOPUS:85146735645
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 286
EP - 290
BT - 2022 IEEE International Conference on Image Processing, ICIP 2022 - Proceedings
PB - IEEE Computer Society
T2 - 29th IEEE International Conference on Image Processing, ICIP 2022
Y2 - 16 October 2022 through 19 October 2022
ER -