Binary document classification based on fast flux discriminant with similarity measure on word set

Keisuke Okubo, Gendo Kumoi*, Masayuki Goto

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Fast Flux Discriminant (FFD) is known as one of the high-performance nonlinear binary classifiers, and it is possible to construct a classification model considering the interaction between variables. In order to take account of the interaction between variables, FFD introduces the histogram-based kernel smoothing using subspaces including variable combinations. However, when creating a subspace, the original FFD should cover all variables including combinations of variables with low interaction. Therefore, the disadvantage is that the calculation amount increases exponentially as the dimension increases. In this study, we calculate the similarity between variables by using KL divergence. Then, among the obtained similarities, divisions are performed for each subspace with similar variables. Through this method, we try to reduce the amount of calculation while maintaining classification accuracy by using only combinations of variables that are likely to take high interaction. Through the simulation experiments with Japanese newspaper articles, the effectiveness of our proposed method is clarified.

Original languageEnglish
Pages (from-to)245-251
Number of pages7
JournalIndustrial Engineering and Management Systems
Volume18
Issue number2
DOIs
Publication statusPublished - 2019

Keywords

  • Binary classification
  • Interaction
  • KL divergence
  • Similarity
  • Text data

ASJC Scopus subject areas

  • Social Sciences(all)
  • Economics, Econometrics and Finance(all)

Fingerprint

Dive into the research topics of 'Binary document classification based on fast flux discriminant with similarity measure on word set'. Together they form a unique fingerprint.

Cite this