TY - JOUR
T1 - QA-Filter
T2 - A QP-Adaptive Convolutional Neural Network Filter for Video Coding
AU - Liu, Chao
AU - Sun, Heming
AU - Katto, Jiro
AU - Zeng, Xiaoyang
AU - Fan, Yibo
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 62031009; in part by the Alibaba Innovative Research (AIR) Program; in part by the Innovation Program of Shanghai Municipal Education Commission; in part by the Fudan University-Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP) Joint Fund under Grant FC2019-001; in part by the Fudan-ZTE Joint Lab; in part by the Pioneering Project of Academy for Engineering and Technology Fudan University under Grant gyy2021-001; in part by the Japan Science and Technology Agency (JST), Precursory Research for Embryonic Science and Technology (PRESTO), under Grant JPMJPR19M5; in part by the Japan Society for the Promotion of Science (JSPS), Grantsin- Aid for Scientific Research (KAKENHI), under Grant 21K17770; and in part by the Kenjiro Takayanagi Foundation. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Chaker Larabi.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Convolutional neural network (CNN)-based filters have achieved great success in video coding. However, in most previous works, individual models were needed for each quantization parameter (QP) band, which is impractical due to limited storage resources. To explore this, our work consists of two parts. First, we propose a frequency and spatial QP-adaptive mechanism (FSQAM), which can be directly applied to the (vanilla) convolution to help any CNN filter handle different quantization noise. From the frequency domain, a FQAM that introduces the quantization step (Qstep) into the convolution is proposed. When the quantization noise increases, the ability of the CNN filter to suppress noise improves. Moreover, SQAM is further designed to compensate for the FQAM from the spatial domain. Second, based on FSQAM, a QP-adaptive CNN filter called QA-Filter that can be used under a wide range of QP is proposed. By factorizing the mixed features to high-frequency and low-frequency parts with the pair of pooling and upsampling operations, the QA-Filter and FQAM can promote each other to obtain better performance. Compared to the H.266/VVC baseline, average 5.25% and 3.84% BD-rate reductions for luma are achieved by QA-Filter with default all-intra (AI) and random-access (RA) configurations, respectively. Additionally, an up to 9.16% BD-rate reduction is achieved on the luma of sequence BasketballDrill. Besides, FSQAM achieves measurably better BD-rate performance compared with the previous QP map method.
AB - Convolutional neural network (CNN)-based filters have achieved great success in video coding. However, in most previous works, individual models were needed for each quantization parameter (QP) band, which is impractical due to limited storage resources. To explore this, our work consists of two parts. First, we propose a frequency and spatial QP-adaptive mechanism (FSQAM), which can be directly applied to the (vanilla) convolution to help any CNN filter handle different quantization noise. From the frequency domain, a FQAM that introduces the quantization step (Qstep) into the convolution is proposed. When the quantization noise increases, the ability of the CNN filter to suppress noise improves. Moreover, SQAM is further designed to compensate for the FQAM from the spatial domain. Second, based on FSQAM, a QP-adaptive CNN filter called QA-Filter that can be used under a wide range of QP is proposed. By factorizing the mixed features to high-frequency and low-frequency parts with the pair of pooling and upsampling operations, the QA-Filter and FQAM can promote each other to obtain better performance. Compared to the H.266/VVC baseline, average 5.25% and 3.84% BD-rate reductions for luma are achieved by QA-Filter with default all-intra (AI) and random-access (RA) configurations, respectively. Additionally, an up to 9.16% BD-rate reduction is achieved on the luma of sequence BasketballDrill. Besides, FSQAM achieves measurably better BD-rate performance compared with the previous QP map method.
KW - Convolutional neural network
KW - H.266/VVC
KW - in-loop filter
KW - video coding
UR - http://www.scopus.com/inward/record.url?scp=85127816221&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127816221&partnerID=8YFLogxK
U2 - 10.1109/TIP.2022.3152627
DO - 10.1109/TIP.2022.3152627
M3 - Article
C2 - 35385382
AN - SCOPUS:85127816221
SN - 1057-7149
VL - 31
SP - 3032
EP - 3045
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -