A note on document classification with small training data

Yasunari Maeda*, Hideki Yoshida, Masakiyo Suzuki, Toshiyasu Matsushima

*この研究の対応する著者

研究成果: Article査読

抄録

Document classification is one of important topics in the field of NLP (Natural Language Processing). In the previous research a document classification method has been proposed which minimizes an error rate with reference to a Bayes criterion. But when the number of documents in training data is small, the accuracy of the previous method is low. So in this research we use estimating data in order to estimate prior distributions. When the training data is small the accuracy using estimating data is higher than the accuracy of the previous method. But when the training data is big the accuracy using estimating data is lower than the accuracy of the previous method. So in this research we also propose another technique whose accuracy is higher than the accuracy of the previous method when the training data is small, and is almost the same as the accuracy of the previous method when the training data is big.

本文言語English
ページ(範囲)1459-1466
ページ数8
ジャーナルIEEJ Transactions on Electronics, Information and Systems
131
8
DOI
出版ステータスPublished - 2011

ASJC Scopus subject areas

  • 電子工学および電気工学

フィンガープリント

「A note on document classification with small training data」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル