TY - GEN
T1 - Asymptotic evaluation of distance measure on high dimensional vector spaces in text mining
AU - Goto, Masayuki
AU - Ishida, Takashi
AU - Suzuki, Makoto
AU - Hirasawa, Shigeichi
PY - 2008
Y1 - 2008
N2 - This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and such thinking will give us very clear idea. In this paper, the performances of distance measures used to classify the documents are analyzed from the new viewpoint of asymptotic analysis. We also discuss the asymptotic performance of IDF measure used in the information retrieval field.
AB - This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and such thinking will give us very clear idea. In this paper, the performances of distance measures used to classify the documents are analyzed from the new viewpoint of asymptotic analysis. We also discuss the asymptotic performance of IDF measure used in the information retrieval field.
UR - http://www.scopus.com/inward/record.url?scp=77951132642&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951132642&partnerID=8YFLogxK
U2 - 10.1109/ISITA.2008.4895453
DO - 10.1109/ISITA.2008.4895453
M3 - Conference contribution
AN - SCOPUS:77951132642
SN - 9781424420698
T3 - 2008 International Symposium on Information Theory and its Applications, ISITA2008
BT - 2008 International Symposium on Information Theory and its Applications, ISITA2008
T2 - 2008 International Symposium on Information Theory and its Applications, ISITA2008
Y2 - 7 December 2008 through 10 December 2008
ER -