TY - GEN
T1 - A Transformer-based Semantic Segmentation Model for Street Fashion Images
AU - Peng, Dingjie
AU - Kameyama, Wataru
N1 - Publisher Copyright:
© 2023 SPIE.
PY - 2023
Y1 - 2023
N2 - Semantic segmentation is a pixel-level classification problem in computer vision, in which pixels of the same class are grouped into a single category in order to interpret pictures at the pixel level. In this field, semantic segmentation of street fashion images is a challenging task since the clothing items would appear with wide variations in fabrics, layering, occlusion and viewpoint. To help better understanding the street fashion images, we propose a lightweight Semantic Context Aware Transformer (SCAT) to be applied to the semantic segmentation task for street fashion images, which integrates semantic context into the encoding, and models the relationship between multi-level outputs from transformer layers. Extensive experiments and comparisons show that the proposal achieves the state-of-the-art results on ModaNet dataset with relatively small model size, with over 1.1 point improvement compared to Shunted Transformer, and even surpasses other CNNs and Transformers with a large margin of over 2 point in mIoU.
AB - Semantic segmentation is a pixel-level classification problem in computer vision, in which pixels of the same class are grouped into a single category in order to interpret pictures at the pixel level. In this field, semantic segmentation of street fashion images is a challenging task since the clothing items would appear with wide variations in fabrics, layering, occlusion and viewpoint. To help better understanding the street fashion images, we propose a lightweight Semantic Context Aware Transformer (SCAT) to be applied to the semantic segmentation task for street fashion images, which integrates semantic context into the encoding, and models the relationship between multi-level outputs from transformer layers. Extensive experiments and comparisons show that the proposal achieves the state-of-the-art results on ModaNet dataset with relatively small model size, with over 1.1 point improvement compared to Shunted Transformer, and even surpasses other CNNs and Transformers with a large margin of over 2 point in mIoU.
KW - Semantic Context
KW - Semantic Segmentation
KW - Street Fashion Images
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85159362592&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159362592&partnerID=8YFLogxK
U2 - 10.1117/12.2666583
DO - 10.1117/12.2666583
M3 - Conference contribution
AN - SCOPUS:85159362592
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - International Workshop on Advanced Imaging Technology, IWAIT 2023
A2 - Nakajima, Masayuki
A2 - Kim, Jae-Gon
A2 - Seo, Kwang-deok
A2 - Yamasaki, Toshihiko
A2 - Guo, Jing-Ming
A2 - Lau, Phooi Yee
A2 - Kemao, Qian
PB - SPIE
T2 - 2023 International Workshop on Advanced Imaging Technology, IWAIT 2023
Y2 - 9 January 2023 through 11 January 2023
ER -