TY - GEN
T1 - Robust Semantic Segmentation for Street Fashion Photos
AU - Dang, Anh H.
AU - Kameyama, Wataru
N1 - Funding Information:
Manuscript received October 19, 2019. This research is supported by funding from leaftnet Co., Ltd. in Japan. Anh H. Dang is with GITI, Waseda University, Tokyo, Japan. (corresponding author, phone: +81-80-1367-9637, email: anh@aoni.waseda.jp) Wataru Kameyama is with Faculty of Science and Engineering, Waseda University, Tokyo, Japan (email: wataru@waseda.jp) Fig. 1. Samples from our custom street fashion data set. In the top row, original images are shown, and in the bottom row, corresponding segmentation ground truths are shown. The class names for each color are shown in Table I. Photos are public domain works downloaded from Pexels.com, and labels are manually annotated by the authors.
Publisher Copyright:
© 2020 Global IT Research Institute - GIRI.
PY - 2020/2
Y1 - 2020/2
N2 - In this paper, we aim to produce the state-of-the-art semantic segmentation for street fashion photos with three contributions. Firstly, we propose a high-performance semantic segmentation network that follows the encoder-decoder structure. Secondly, we propose a guided training process using multiple auxiliary losses. And thirdly, the 2D max-pooling-based scaling operation to produce segmentation feature maps for the aforementioned guided training process. We also propose mIoU+ metric taking noise into account for better evaluation. Evaluations with the ModaNet data set show that the proposed network achieves high benchmark results with less computational cost compared to ever-proposed methods.
AB - In this paper, we aim to produce the state-of-the-art semantic segmentation for street fashion photos with three contributions. Firstly, we propose a high-performance semantic segmentation network that follows the encoder-decoder structure. Secondly, we propose a guided training process using multiple auxiliary losses. And thirdly, the 2D max-pooling-based scaling operation to produce segmentation feature maps for the aforementioned guided training process. We also propose mIoU+ metric taking noise into account for better evaluation. Evaluations with the ModaNet data set show that the proposed network achieves high benchmark results with less computational cost compared to ever-proposed methods.
KW - label pooling
KW - mIoU+
KW - semantic segmentation
KW - street fashion photos
UR - http://www.scopus.com/inward/record.url?scp=85083955843&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083955843&partnerID=8YFLogxK
U2 - 10.23919/ICACT48636.2020.9061408
DO - 10.23919/ICACT48636.2020.9061408
M3 - Conference contribution
AN - SCOPUS:85083955843
T3 - International Conference on Advanced Communication Technology, ICACT
SP - 1248
EP - 1257
BT - 22nd International Conference on Advanced Communications Technology
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd International Conference on Advanced Communications Technology, ICACT 2020
Y2 - 16 February 2020 through 19 February 2020
ER -