ViT-GAN: Using Vision Transformer as Discriminator with Adaptive Data Augmentation

Shota Hirose, Naoki Wada, Jiro Katto, Heming Sun

研究成果: Conference contribution

3 被引用数 (Scopus)

抄録

These days, attention is thought to be an efficient way to recognize an image. Vision Transformer (ViT) uses a Transformer for images and has very high performance in image recognition. ViT has fewer parameters than Big Transfer (BiT) and Noisy Student. Therefore, we consider that Self-Attention-based networks are slimmer than convolution-based networks. We use a ViT as a Discriminator in a Generative Adversarial Network (GAN) to get the same performance with a smaller model. We name it ViT-GAN. Besides, we find parameter sharing is very useful to make parameter-efficient ViT. However, the performances of ViT heavily depend on the number of data samples. Therefore, we propose a new method of Data Augmentation. Our Data Augmentation, in which the strength of Data Augmentation varies adaptively, helps ViT for faster convergence and better performance. With our Data Augmentation, we show ViT-based discriminator can achieve almost the same FID but the number of the parameters of the discriminator is 35% fewer than the original discriminator.

本文言語English
ホスト出版物のタイトル2021 3rd International Conference on Computer Communication and the Internet, ICCCI 2021
出版社Institute of Electrical and Electronics Engineers Inc.
ページ185-189
ページ数5
ISBN(電子版)9781728176185
DOI
出版ステータスPublished - 2021 6月 25
イベント3rd International Conference on Computer Communication and the Internet, ICCCI 2021 - Virtual, Nagoya, Japan
継続期間: 2021 6月 252021 6月 27

出版物シリーズ

名前2021 3rd International Conference on Computer Communication and the Internet, ICCCI 2021

Conference

Conference3rd International Conference on Computer Communication and the Internet, ICCCI 2021
国/地域Japan
CityVirtual, Nagoya
Period21/6/2521/6/27

ASJC Scopus subject areas

  • コンピュータ ネットワークおよび通信
  • コンピュータ ビジョンおよびパターン認識
  • 信号処理
  • 情報システムおよび情報管理

フィンガープリント

「ViT-GAN: Using Vision Transformer as Discriminator with Adaptive Data Augmentation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル