Data Augmentation for Ancient Characters via Semi-MixFontGan

Zhiyi Yuan, Sei Ichiro Kamata

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The ancient documents provide people a way to understand history. However, the existing materials are suffering from unbalanced characters dataset, as well as intra-class multimodality fonts. As a result, humans and recognition systems are unable to identify these characters effectively. Based on these problems, we propose Semi-MixFontGan: a font generation method based on Semi-Supervised strategy that can learn from a small number of labeled font data to aggregate subclasses' information of categories and generate characters. In generating new samples from ancient books that have a small amount of labeled font data, the model can automatically learn the difference between them and generate font-consistent characters. The model is composed of two parts. In the first part, we propose a MixFont method to mix labeled and unlabeled and generated data. Then use a convolutional autoencoder to learn the font information. In the second part, the generator network can generate reasonable and realistic images by Font and Content Discriminator. Through this model, we can make the ancient book dataset more balanced. Experiments show that the generated characters by our model can get good visual effects and maintain font consistency with training data. With the augmented data, the accuracy of the recognition network has increased. Contribution-We propose a novel font generation method with semi-supervised learning to generate characters from small labeled font Kuzushiji dataset.

Original languageEnglish
Title of host publication2020 Joint 9th International Conference on Informatics, Electronics and Vision and 2020 4th International Conference on Imaging, Vision and Pattern Recognition, ICIEV and icIVPR 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728193311
DOIs
Publication statusPublished - 2020 Aug 26
EventJoint 9th International Conference on Informatics, Electronics and Vision and 4th International Conference on Imaging, Vision and Pattern Recognition, ICIEV and icIVPR 2020 - Kitakyushu, Japan
Duration: 2020 Aug 262020 Aug 29

Publication series

Name2020 Joint 9th International Conference on Informatics, Electronics and Vision and 2020 4th International Conference on Imaging, Vision and Pattern Recognition, ICIEV and icIVPR 2020

Conference

ConferenceJoint 9th International Conference on Informatics, Electronics and Vision and 4th International Conference on Imaging, Vision and Pattern Recognition, ICIEV and icIVPR 2020
Country/TerritoryJapan
CityKitakyushu
Period20/8/2620/8/29

Keywords

  • GAN
  • Semi-Supervised Learning
  • Style Transfer

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Information Systems
  • Electrical and Electronic Engineering
  • Instrumentation

Fingerprint

Dive into the research topics of 'Data Augmentation for Ancient Characters via Semi-MixFontGan'. Together they form a unique fingerprint.

Cite this