TY - GEN
T1 - ViT-GAN
T2 - 3rd International Conference on Computer Communication and the Internet, ICCCI 2021
AU - Hirose, Shota
AU - Wada, Naoki
AU - Katto, Jiro
AU - Sun, Heming
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/6/25
Y1 - 2021/6/25
N2 - These days, attention is thought to be an efficient way to recognize an image. Vision Transformer (ViT) uses a Transformer for images and has very high performance in image recognition. ViT has fewer parameters than Big Transfer (BiT) and Noisy Student. Therefore, we consider that Self-Attention-based networks are slimmer than convolution-based networks. We use a ViT as a Discriminator in a Generative Adversarial Network (GAN) to get the same performance with a smaller model. We name it ViT-GAN. Besides, we find parameter sharing is very useful to make parameter-efficient ViT. However, the performances of ViT heavily depend on the number of data samples. Therefore, we propose a new method of Data Augmentation. Our Data Augmentation, in which the strength of Data Augmentation varies adaptively, helps ViT for faster convergence and better performance. With our Data Augmentation, we show ViT-based discriminator can achieve almost the same FID but the number of the parameters of the discriminator is 35% fewer than the original discriminator.
AB - These days, attention is thought to be an efficient way to recognize an image. Vision Transformer (ViT) uses a Transformer for images and has very high performance in image recognition. ViT has fewer parameters than Big Transfer (BiT) and Noisy Student. Therefore, we consider that Self-Attention-based networks are slimmer than convolution-based networks. We use a ViT as a Discriminator in a Generative Adversarial Network (GAN) to get the same performance with a smaller model. We name it ViT-GAN. Besides, we find parameter sharing is very useful to make parameter-efficient ViT. However, the performances of ViT heavily depend on the number of data samples. Therefore, we propose a new method of Data Augmentation. Our Data Augmentation, in which the strength of Data Augmentation varies adaptively, helps ViT for faster convergence and better performance. With our Data Augmentation, we show ViT-based discriminator can achieve almost the same FID but the number of the parameters of the discriminator is 35% fewer than the original discriminator.
KW - Data Augmentation
KW - Generative Adversarial Network
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=85112175205&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112175205&partnerID=8YFLogxK
U2 - 10.1109/ICCCI51764.2021.9486805
DO - 10.1109/ICCCI51764.2021.9486805
M3 - Conference contribution
AN - SCOPUS:85112175205
T3 - 2021 3rd International Conference on Computer Communication and the Internet, ICCCI 2021
SP - 185
EP - 189
BT - 2021 3rd International Conference on Computer Communication and the Internet, ICCCI 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 25 June 2021 through 27 June 2021
ER -