TY - JOUR
T1 - Oversampling the minority class in a multi-linear feature space for imbalanced data classification
AU - Liang, Peifeng
AU - Li, Weite
AU - Hu, Jinglu
N1 - Publisher Copyright:
© 2018 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
PY - 2018/10
Y1 - 2018/10
N2 - This paper proposes a novel oversampling method for imbalanced data classification, in which the minority class samples are synthesized in a feature space to avoid the generated minority samples falling into the majority class regions. For this purpose, it introduces a multi-linear feature space (MLFS) based on a quasi-linear kernel, which is composed from a pretrained neural network (NN). By using the quasi-linear kernel, the proposed MLFS oversampling method avoids computing directly the Euclidean distances among the samples when oversampling the minority class and mapping the samples to high-dimensional feature space, which makes it easy to be applied to classification of high-dimensional datasets. On the other hand, by using kernel learning instead of representation learning using the NN, it makes an unsupervised learning, even a transfer learning, to be easily employed for the pretraining of NNs because a kernel is usually less dependent on a specific problem, which makes it possible to avoid considering the imbalance problem at the stage of pretraining the NN. Finally, a method is developed to oversample the synthetic minority samples by computing the quasi-linear kernel matrix instead of computing very high dimensional MLFS feature vectors directly. The proposed MLFS oversampling method is applied to different real-world datasets including image dataset, and simulation results confirm the effectiveness of the proposed method.
AB - This paper proposes a novel oversampling method for imbalanced data classification, in which the minority class samples are synthesized in a feature space to avoid the generated minority samples falling into the majority class regions. For this purpose, it introduces a multi-linear feature space (MLFS) based on a quasi-linear kernel, which is composed from a pretrained neural network (NN). By using the quasi-linear kernel, the proposed MLFS oversampling method avoids computing directly the Euclidean distances among the samples when oversampling the minority class and mapping the samples to high-dimensional feature space, which makes it easy to be applied to classification of high-dimensional datasets. On the other hand, by using kernel learning instead of representation learning using the NN, it makes an unsupervised learning, even a transfer learning, to be easily employed for the pretraining of NNs because a kernel is usually less dependent on a specific problem, which makes it possible to avoid considering the imbalance problem at the stage of pretraining the NN. Finally, a method is developed to oversample the synthetic minority samples by computing the quasi-linear kernel matrix instead of computing very high dimensional MLFS feature vectors directly. The proposed MLFS oversampling method is applied to different real-world datasets including image dataset, and simulation results confirm the effectiveness of the proposed method.
KW - imbalanced data classification
KW - kernel composition
KW - multi-linear feature space
KW - oversampling
KW - support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85047724495&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047724495&partnerID=8YFLogxK
U2 - 10.1002/tee.22715
DO - 10.1002/tee.22715
M3 - Article
AN - SCOPUS:85047724495
SN - 1931-4973
VL - 13
SP - 1483
EP - 1491
JO - IEEJ Transactions on Electrical and Electronic Engineering
JF - IEEJ Transactions on Electrical and Electronic Engineering
IS - 10
ER -