Fast SVM training using data reconstruction for classification of very large datasets

Peifeng Liang, Weite Li, Jinglu Hu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

This paper proposes a fast support vector machine (SVM) training method for the classification of very large datasets using data reconstruction. The idea is to scale down the training data by removing the samples that have low probability to become support vectors (SVs) in the feature space. For this purpose, it applies a series of gradually refined rough SVM classifiers with a quasi-linear kernel to build rough separation boundaries and remove those samples that are far away from the boundary. In order to make the proposed algorithm efficient for both low-dimensional and high-dimensional datasets, efforts are made on three aspects. The first one is to compose a quasi-linear kernel using the information of data manifold and potential separation boundary such that the samples mapped to feature space keep a sparse distribution, especially in the direction perpendicular to the separation boundary. The second one is to avoid computing Euclidean distances between samples, which may lose its effect on very high dimensional datasets when mapping the samples to feature space and selecting the samples for training data reconstruction. The third one is to design a sophisticated iterative algorithm to gradually refine the rough SVM classifier so as to remove non-SVs efficiently. The proposed fast SVM training method is applied to different real-world large datasets and compared with different methods, and simulation results confirm the effectiveness of the proposed method, especially for very high dimensional datasets.

Original languageEnglish
Pages (from-to)372-381
Number of pages10
JournalIEEJ Transactions on Electrical and Electronic Engineering
Volume15
Issue number3
DOIs
Publication statusPublished - 2020 Mar 1

Keywords

  • fast SVM training
  • large datasets
  • quasi-linear kernel
  • support vector machine
  • training data reconstruction

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Fast SVM training using data reconstruction for classification of very large datasets'. Together they form a unique fingerprint.

Cite this