A fast SVM training method for very large datasets

Boyang Li*, Qiangwei Wang, Jinglu Hu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of training dataset. Thus, it is computationally infeasible for very large datasets. Reducing the size of training dataset is naturally considered to solve this problem. SVM classifiers depend on only support vectors (SVs) that lie close to the separation boundary. Therefore, we need to reserve the samples that are likely to be SVs. In this paper, we propose a method based on the edge detection technique to detect these samples. To preserve the entire distribution properties, we also use a clustering algorithm such as K-means to calculate the centroids of clusters. The samples selected by edge detector and the centroids of clusters are used to reconstruct the training dataset. The reconstructed training dataset with a smaller size makes the training process much faster, but without degrading the classification accuracies.

Original languageEnglish
Title of host publication2009 International Joint Conference on Neural Networks, IJCNN 2009
Pages1784-1789
Number of pages6
DOIs
Publication statusPublished - 2009 Nov 20
Event2009 International Joint Conference on Neural Networks, IJCNN 2009 - Atlanta, GA, United States
Duration: 2009 Jun 142009 Jun 19

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2009 International Joint Conference on Neural Networks, IJCNN 2009
Country/TerritoryUnited States
CityAtlanta, GA
Period09/6/1409/6/19

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A fast SVM training method for very large datasets'. Together they form a unique fingerprint.

Cite this