Deep learning with data augmentation to add data around classification boundaries

Hideki Fujinami*, Gendo Kumoi, Masayuki Goto

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Data augmentation methods are used as a technique to improve generalization by increasing the number of training data in image classification. However, most of these methods are not a data driven algorithm, the degree of improvement of generalization ability by performing these data augmentation methods differs between the domains of image data for training. Generative models are researched to use for augmenting data recently. In particular, Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) that can generate clean image get attention as an excellent innovation in machine learning. As GANs extension method, there is a method called CGANs (Mirza and Osindero, 2014) that can be used for data augmentation. When enough training data for each class are not prepared for classification model, the same is true for training CGANs. In such case, CGAN generates noisy images. This makes a classification model to underfit to the original training data. Moreover, when a CGAN approximates the training data distribution, the CGAN generates new training data in the same region where training data densely exist. In such case, augmented data can't reduce overfitting on the original training data. Therefore, our research contributes to augment data which meets these two requirements. In this study, we propose a method to generate data by the class specific GAN with small training data and selectively add generated data to the training data set that improves classification accuracy by using the entropy of the classification model. The feature of the proposed method is that it focuses on the positional relationship between data and the classification hyperplane in deep learning. In the proposed method, the entropy of the classification model is used to measure the positional relationship between the classification boundary and the data. As a result, the generalization performance is improved by adding the data around the classification boundary as new training data.

Original languageEnglish
Pages (from-to)384-397
Number of pages14
JournalIndustrial Engineering and Management Systems
Volume20
Issue number3
DOIs
Publication statusPublished - 2021 Sept

Keywords

  • Convolutional Neural Network
  • Data Augmentation
  • Generative Adversarial Networks
  • Image Classification

ASJC Scopus subject areas

  • Social Sciences(all)
  • Economics, Econometrics and Finance(all)

Fingerprint

Dive into the research topics of 'Deep learning with data augmentation to add data around classification boundaries'. Together they form a unique fingerprint.

Cite this