Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge

Jun Cheng, Fuxiang Wu*, Yanling Tian, Lei Wang, Dapeng Tao

*この研究の対応する著者

研究成果: Conference article査読

45 被引用数 (Scopus)

抄録

Text-to-image synthesis is a challenging task that generates realistic images from a textual sequence, which usually contains limited information compared with the corresponding image and so is ambiguous and abstractive. The limited textual information only describes a scene partly, which will complicate the generation with complementing the other details implicitly and lead to low-quality images. To address this problem, we propose a novel rich feature generation text-to-image synthesis, called RiFeGAN, to enrich the given description. In order to provide additional visual details and avoid conflicting, RiFeGAN exploits an attention-based caption matching model to select and refine the compatible candidate captions from prior knowledge. Given enriched captions, RiFeGAN uses self-attentional embedding mixtures to extract features across them effectually and handle the diverging features further. Then it exploits multi-captions attentional generative adversarial networks to synthesize images from those features. The experiments conducted on widely-used datasets show that the models can generate images from enriched captions effectually and improve the results significantly.

本文言語English
論文番号9156682
ページ(範囲)10908-10917
ページ数10
ジャーナルProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOI
出版ステータスPublished - 2020
イベント2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States
継続期間: 2020 6月 142020 6月 19

ASJC Scopus subject areas

  • ソフトウェア
  • コンピュータ ビジョンおよびパターン認識

フィンガープリント

「Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル