RiFeGAN2: Rich Feature Generation for Text-to-Image Synthesis from Constrained Prior Knowledge

Jun Cheng, Fuxiang Wu*, Yanling Tian, Lei Wang, Dapeng Tao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Text-to-image synthesis is a challenging task that generates realistic images from a textual description. The description contains limited information compared with the corresponding image and is ambiguous and abstract, which will complicate the generation and lead to low-quality images. To address this problem, we propose a novel generation text-to-image synthesis method, called RiFeGAN2, to enrich the given description. To improve the enrichment quality while accelerating the enrichment process, RiFeGAN2 exploits a domain-specific constrained model to limit the search scope and then uses an attention-based caption matching model to refine the compatible candidate captions based on constrained prior knowledge. To improve the semantic consistency between the given description and the synthesized results, RiFeGAN2 employs improved SAEMs, SAEM2s, to compact better features of the retrieved captions and effectively emphasize the descriptions via incorporating centre-attention layers. Finally, multi-caption attentional GANs are exploited to synthesize images from those features. Experiments performed on widely-used datasets show that the models can generate vivid images from enriched captions and effectually improve the semantic consistency.

Original languageEnglish
Pages (from-to)5187-5200
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume32
Issue number8
DOIs
Publication statusPublished - 2022 Aug 1
Externally publishedYes

Keywords

  • Text-to-image synthesis
  • multiple captions
  • prior knowledge

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'RiFeGAN2: Rich Feature Generation for Text-to-Image Synthesis from Constrained Prior Knowledge'. Together they form a unique fingerprint.

Cite this