TY - GEN
T1 - Data Augmentation for Ancient Characters via Semi-MixFontGan
AU - Yuan, Zhiyi
AU - Kamata, Sei Ichiro
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/8/26
Y1 - 2020/8/26
N2 - The ancient documents provide people a way to understand history. However, the existing materials are suffering from unbalanced characters dataset, as well as intra-class multimodality fonts. As a result, humans and recognition systems are unable to identify these characters effectively. Based on these problems, we propose Semi-MixFontGan: a font generation method based on Semi-Supervised strategy that can learn from a small number of labeled font data to aggregate subclasses' information of categories and generate characters. In generating new samples from ancient books that have a small amount of labeled font data, the model can automatically learn the difference between them and generate font-consistent characters. The model is composed of two parts. In the first part, we propose a MixFont method to mix labeled and unlabeled and generated data. Then use a convolutional autoencoder to learn the font information. In the second part, the generator network can generate reasonable and realistic images by Font and Content Discriminator. Through this model, we can make the ancient book dataset more balanced. Experiments show that the generated characters by our model can get good visual effects and maintain font consistency with training data. With the augmented data, the accuracy of the recognition network has increased. Contribution-We propose a novel font generation method with semi-supervised learning to generate characters from small labeled font Kuzushiji dataset.
AB - The ancient documents provide people a way to understand history. However, the existing materials are suffering from unbalanced characters dataset, as well as intra-class multimodality fonts. As a result, humans and recognition systems are unable to identify these characters effectively. Based on these problems, we propose Semi-MixFontGan: a font generation method based on Semi-Supervised strategy that can learn from a small number of labeled font data to aggregate subclasses' information of categories and generate characters. In generating new samples from ancient books that have a small amount of labeled font data, the model can automatically learn the difference between them and generate font-consistent characters. The model is composed of two parts. In the first part, we propose a MixFont method to mix labeled and unlabeled and generated data. Then use a convolutional autoencoder to learn the font information. In the second part, the generator network can generate reasonable and realistic images by Font and Content Discriminator. Through this model, we can make the ancient book dataset more balanced. Experiments show that the generated characters by our model can get good visual effects and maintain font consistency with training data. With the augmented data, the accuracy of the recognition network has increased. Contribution-We propose a novel font generation method with semi-supervised learning to generate characters from small labeled font Kuzushiji dataset.
KW - GAN
KW - Semi-Supervised Learning
KW - Style Transfer
UR - http://www.scopus.com/inward/record.url?scp=85099877061&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099877061&partnerID=8YFLogxK
U2 - 10.1109/ICIEVicIVPR48672.2020.9306588
DO - 10.1109/ICIEVicIVPR48672.2020.9306588
M3 - Conference contribution
AN - SCOPUS:85099877061
T3 - 2020 Joint 9th International Conference on Informatics, Electronics and Vision and 2020 4th International Conference on Imaging, Vision and Pattern Recognition, ICIEV and icIVPR 2020
BT - 2020 Joint 9th International Conference on Informatics, Electronics and Vision and 2020 4th International Conference on Imaging, Vision and Pattern Recognition, ICIEV and icIVPR 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - Joint 9th International Conference on Informatics, Electronics and Vision and 4th International Conference on Imaging, Vision and Pattern Recognition, ICIEV and icIVPR 2020
Y2 - 26 August 2020 through 29 August 2020
ER -