TY - JOUR
T1 - Rifegan
T2 - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020
AU - Cheng, Jun
AU - Wu, Fuxiang
AU - Tian, Yanling
AU - Wang, Lei
AU - Tao, Dapeng
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China (U1713213, 61772508, 61772455, U1913202, U1813205), in part by CAS Key Technology Talent Program.
Publisher Copyright:
©2020 IEEE.
PY - 2020
Y1 - 2020
N2 - Text-to-image synthesis is a challenging task that generates realistic images from a textual sequence, which usually contains limited information compared with the corresponding image and so is ambiguous and abstractive. The limited textual information only describes a scene partly, which will complicate the generation with complementing the other details implicitly and lead to low-quality images. To address this problem, we propose a novel rich feature generation text-to-image synthesis, called RiFeGAN, to enrich the given description. In order to provide additional visual details and avoid conflicting, RiFeGAN exploits an attention-based caption matching model to select and refine the compatible candidate captions from prior knowledge. Given enriched captions, RiFeGAN uses self-attentional embedding mixtures to extract features across them effectually and handle the diverging features further. Then it exploits multi-captions attentional generative adversarial networks to synthesize images from those features. The experiments conducted on widely-used datasets show that the models can generate images from enriched captions effectually and improve the results significantly.
AB - Text-to-image synthesis is a challenging task that generates realistic images from a textual sequence, which usually contains limited information compared with the corresponding image and so is ambiguous and abstractive. The limited textual information only describes a scene partly, which will complicate the generation with complementing the other details implicitly and lead to low-quality images. To address this problem, we propose a novel rich feature generation text-to-image synthesis, called RiFeGAN, to enrich the given description. In order to provide additional visual details and avoid conflicting, RiFeGAN exploits an attention-based caption matching model to select and refine the compatible candidate captions from prior knowledge. Given enriched captions, RiFeGAN uses self-attentional embedding mixtures to extract features across them effectually and handle the diverging features further. Then it exploits multi-captions attentional generative adversarial networks to synthesize images from those features. The experiments conducted on widely-used datasets show that the models can generate images from enriched captions effectually and improve the results significantly.
UR - http://www.scopus.com/inward/record.url?scp=85094596886&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094596886&partnerID=8YFLogxK
U2 - 10.1109/CVPR42600.2020.01092
DO - 10.1109/CVPR42600.2020.01092
M3 - Conference article
AN - SCOPUS:85094596886
SN - 1063-6919
SP - 10908
EP - 10917
JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
M1 - 9156682
Y2 - 14 June 2020 through 19 June 2020
ER -