TY - GEN
T1 - Two-stage incremental working set selection for fast support vector training on large datasets
AU - Nguyen, Duc Dung
AU - Matsumoto, Kazunori
AU - Takishima, Yasuhiro
AU - Hashimoto, Kazuo
AU - Terabe, Masahiro
PY - 2008
Y1 - 2008
N2 - We propose iSVM - an incremental algorithm that achieves high speed in training support vector machines (SVMs) on large datasets. In the common decomposition framework, iSVM starts with a minimum working set (WS), and then iteratively selects one training example to update the WS in each optimization loop. iSVM employs a two-stage strategy in processing the training data. In the first stage, the most prominent vector among randomly sampled data is added to the WS. This stage results in an approximate SVM solution. The second stage uses temporal solutions to scan through the whole training data once again to find the remaining support vectors (SVs). We show that iSVM is especially efficient for training SVMs on applications where data size is much larger than number of SVs. On the KDD-CUP 1999 network intrusion detection dataset with nearly five millions training examples, iSVM takes less than one hour to train an SVM with 94% testing accuracy, compared to seven hours with LibSVM - one of the state-of-the-art SVM implementations. We also provide analysis and experimental comparisons between iSVM and the related algorithms.
AB - We propose iSVM - an incremental algorithm that achieves high speed in training support vector machines (SVMs) on large datasets. In the common decomposition framework, iSVM starts with a minimum working set (WS), and then iteratively selects one training example to update the WS in each optimization loop. iSVM employs a two-stage strategy in processing the training data. In the first stage, the most prominent vector among randomly sampled data is added to the WS. This stage results in an approximate SVM solution. The second stage uses temporal solutions to scan through the whole training data once again to find the remaining support vectors (SVs). We show that iSVM is especially efficient for training SVMs on applications where data size is much larger than number of SVs. On the KDD-CUP 1999 network intrusion detection dataset with nearly five millions training examples, iSVM takes less than one hour to train an SVM with 94% testing accuracy, compared to seven hours with LibSVM - one of the state-of-the-art SVM implementations. We also provide analysis and experimental comparisons between iSVM and the related algorithms.
KW - Decomposition method
KW - Optimization
KW - Sequential minimal optimization
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=51949094920&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51949094920&partnerID=8YFLogxK
U2 - 10.1109/RIVF.2008.4586359
DO - 10.1109/RIVF.2008.4586359
M3 - Conference contribution
AN - SCOPUS:51949094920
SN - 9781424423798
T3 - RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies
SP - 221
EP - 226
BT - RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies
T2 - RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies
Y2 - 13 July 2008 through 17 July 2008
ER -