Deep image compression based on multi-scale deformable convolution

Daowen Li, Yingming Li, Heming Sun, Lu Yu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Deep image compression efficiency has been improved in the past years. However, to fully exploit context information for compressing image objects of different scales and shapes, more adaptive geometric structure of inputs should be considered. In this paper, we novelly introduce deformable convolution and its spatial attention extension into deep image compression task to fully exploit the context information. Specifically, a novel deep image compression network with Multi-Scale Deformable Convolution and Spatial Attention, named MS-DCSA, is proposed to better extract compact and efficient latent representation as well as reconstruct higher-quality images. First, multi-scale deformable convolution is presented to provide multi-scale receptive fields for learning spatial sampling offsets in deformable operations. Subsequently, multi-scale deformable spatial attention module is developed to generate attention masks to re-weight extracted features according to their importance. In addition, the multi-scale deformable convolution is applied to design delicate up/down sampling modules. Extensive experiments demonstrate that the proposed MS-DCSA network achieves improved performance on both PSNR and MS-SSIM quality metrics, compared to conventional as well as competing deep image compression methods.

Original languageEnglish
Article number103573
JournalJournal of Visual Communication and Image Representation
Volume87
DOIs
Publication statusPublished - 2022 Aug

Keywords

  • Deep image compression
  • Multi-scale deformable convolution
  • Spatial attention

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Deep image compression based on multi-scale deformable convolution'. Together they form a unique fingerprint.

Cite this