Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge

Jun Cheng, Fuxiang Wu*, Yanling Tian, Lei Wang, Dapeng Tao


研究成果: Conference article査読

45 被引用数 (Scopus)


Text-to-image synthesis is a challenging task that generates realistic images from a textual sequence, which usually contains limited information compared with the corresponding image and so is ambiguous and abstractive. The limited textual information only describes a scene partly, which will complicate the generation with complementing the other details implicitly and lead to low-quality images. To address this problem, we propose a novel rich feature generation text-to-image synthesis, called RiFeGAN, to enrich the given description. In order to provide additional visual details and avoid conflicting, RiFeGAN exploits an attention-based caption matching model to select and refine the compatible candidate captions from prior knowledge. Given enriched captions, RiFeGAN uses self-attentional embedding mixtures to extract features across them effectually and handle the diverging features further. Then it exploits multi-captions attentional generative adversarial networks to synthesize images from those features. The experiments conducted on widely-used datasets show that the models can generate images from enriched captions effectually and improve the results significantly.

ジャーナルProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
出版ステータスPublished - 2020
イベント2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States
継続期間: 2020 6月 142020 6月 19

ASJC Scopus subject areas

  • ソフトウェア
  • コンピュータ ビジョンおよびパターン認識


「Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。