TY - JOUR
T1 - A note on document classification with small training data
AU - Maeda, Yasunari
AU - Yoshida, Hideki
AU - Suzuki, Masakiyo
AU - Matsushima, Toshiyasu
PY - 2011
Y1 - 2011
N2 - Document classification is one of important topics in the field of NLP (Natural Language Processing). In the previous research a document classification method has been proposed which minimizes an error rate with reference to a Bayes criterion. But when the number of documents in training data is small, the accuracy of the previous method is low. So in this research we use estimating data in order to estimate prior distributions. When the training data is small the accuracy using estimating data is higher than the accuracy of the previous method. But when the training data is big the accuracy using estimating data is lower than the accuracy of the previous method. So in this research we also propose another technique whose accuracy is higher than the accuracy of the previous method when the training data is small, and is almost the same as the accuracy of the previous method when the training data is big.
AB - Document classification is one of important topics in the field of NLP (Natural Language Processing). In the previous research a document classification method has been proposed which minimizes an error rate with reference to a Bayes criterion. But when the number of documents in training data is small, the accuracy of the previous method is low. So in this research we use estimating data in order to estimate prior distributions. When the training data is small the accuracy using estimating data is higher than the accuracy of the previous method. But when the training data is big the accuracy using estimating data is lower than the accuracy of the previous method. So in this research we also propose another technique whose accuracy is higher than the accuracy of the previous method when the training data is small, and is almost the same as the accuracy of the previous method when the training data is big.
KW - Document classification
KW - Posterior distribution
KW - Prior distribution
KW - Training data
UR - http://www.scopus.com/inward/record.url?scp=80052706462&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052706462&partnerID=8YFLogxK
U2 - 10.1541/ieejeiss.131.1459
DO - 10.1541/ieejeiss.131.1459
M3 - Article
AN - SCOPUS:80052706462
SN - 0385-4221
VL - 131
SP - 1459
EP - 1466
JO - IEEJ Transactions on Electronics, Information and Systems
JF - IEEJ Transactions on Electronics, Information and Systems
IS - 8
ER -