TY - GEN
T1 - English and taiwanese text categorization using N-gram based on Vector Space Model
AU - Suzuki, Makoto
AU - Yamagishi, Naohide
AU - Tsai, Yi Ching
AU - Ishida, Takashi
AU - Goto, Masayuki
PY - 2010/12/1
Y1 - 2010/12/1
N2 - In this paper, we present a new mathematical model based on a "Vector Space Model" and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, and Taiwanese China Times 2005 data set using the proposed method. The Reuters-21578 data set is a benchmark data set for automatic text categorization. It is shown that FRAM has good classification accuracy. Specifically, the micro-averaged F-measure of the proposed method is 94.5% for English. However, that is 78.0% for Taiwanese. Though the proposed method is language-independent and provides a new perspective, our future work is to improve classification accuracy for Taiwanese.
AB - In this paper, we present a new mathematical model based on a "Vector Space Model" and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, and Taiwanese China Times 2005 data set using the proposed method. The Reuters-21578 data set is a benchmark data set for automatic text categorization. It is shown that FRAM has good classification accuracy. Specifically, the micro-averaged F-measure of the proposed method is 94.5% for English. However, that is 78.0% for Taiwanese. Though the proposed method is language-independent and provides a new perspective, our future work is to improve classification accuracy for Taiwanese.
KW - Classification
KW - N-gram
KW - Newspaper
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=78651327327&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78651327327&partnerID=8YFLogxK
U2 - 10.1109/ISITA.2010.5649453
DO - 10.1109/ISITA.2010.5649453
M3 - Conference contribution
AN - SCOPUS:78651327327
SN - 9781424460175
T3 - ISITA/ISSSTA 2010 - 2010 International Symposium on Information Theory and Its Applications
SP - 106
EP - 111
BT - ISITA/ISSSTA 2010 - 2010 International Symposium on Information Theory and Its Applications
T2 - 2010 20th International Symposium on Information Theory and Its Applications, ISITA 2010 and the 2010 20th International Symposium on Spread Spectrum Techniques and Applications, ISSSTA 2010
Y2 - 17 October 2010 through 20 October 2010
ER -