Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation

Yunlong Liu*, Osamu Yoshie, Hiroshi Watanabe

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution


The difficulty of semantic segmentation in computer vision has been reintroduced as a topic of interest for researchers thanks to the advancement of deep learning algorithms. This research aims into the logic of multi-modal semantic segmentation on images with two different modalities of RGB and Depth, which employs RGB-D images as input. For cross-modal calibration and fusion, this research presents a novel FFCA Module. It can achieve the goal of enhancing segmentation results by acquiring complementing information from several modalities. This module is plug-and-play compatible and can be used with existing neural networks. A multi-modal semantic segmentation network named FFCANet has been designed to test the validity, with a dual-branch encoder structure and a global context module developed using the classic combination of ResNet and DeepLabV3+ backbone. Compared with the baseline, the model used in this research has drastically improved the accuracy of the semantic segmentation task.

Original languageEnglish
Title of host publicationComputer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings
EditorsLei Wang, Juergen Gall, Tat-Jun Chin, Imari Sato, Rama Chellappa
PublisherSpringer Science and Business Media Deutschland GmbH
Number of pages20
ISBN (Print)9783031262920
Publication statusPublished - 2023
Event16th Asian Conference on Computer Vision, ACCV 2022 - Macao, China
Duration: 2022 Dec 42022 Dec 8

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13847 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference16th Asian Conference on Computer Vision, ACCV 2022

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science


Dive into the research topics of 'Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation'. Together they form a unique fingerprint.

Cite this