TY - GEN
T1 - Hit count reliability
T2 - 14th Asia Pacific Web Technology Conference, APWeb 2012
AU - Satoh, Koh
AU - Yamana, Hayato
PY - 2012/4/18
Y1 - 2012/4/18
N2 - Recently, there have been numerous studies that rely on the number of search results, i.e., hit count. However, hit counts returned by search engines can vary unnaturally when observed on different days, and may contain large errors that affect researches that depend on those results. Such errors can result in low precision of machine translation, incorrect extraction of synonyms and other problems. Thus, it is indispensable to evaluate and to improve the reliability of hit counts. There exist several researches to show the phenomenon; however, none of previous researches have made clear how much we can trust them. In this paper, we propose hit counts' reliability metrics to quantitatively evaluate hit counts' reliability to improve hit count selection. The evaluation results with Google show that our metrics successfully adopt reliable hit counts - 99.8% precision, and skip to adopt unreliable hit counts - 74.3% precision.
AB - Recently, there have been numerous studies that rely on the number of search results, i.e., hit count. However, hit counts returned by search engines can vary unnaturally when observed on different days, and may contain large errors that affect researches that depend on those results. Such errors can result in low precision of machine translation, incorrect extraction of synonyms and other problems. Thus, it is indispensable to evaluate and to improve the reliability of hit counts. There exist several researches to show the phenomenon; however, none of previous researches have made clear how much we can trust them. In this paper, we propose hit counts' reliability metrics to quantitatively evaluate hit counts' reliability to improve hit count selection. The evaluation results with Google show that our metrics successfully adopt reliable hit counts - 99.8% precision, and skip to adopt unreliable hit counts - 74.3% precision.
KW - Hit Count
KW - Information Retrieval
KW - Reliability
KW - Search Engine
UR - http://www.scopus.com/inward/record.url?scp=84859729401&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859729401&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-29253-8_73
DO - 10.1007/978-3-642-29253-8_73
M3 - Conference contribution
AN - SCOPUS:84859729401
SN - 9783642292521
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 751
EP - 758
BT - Web Technologies and Applications - 14th Asia-Pacific Web Conference, APWeb 2012, Proceedings
Y2 - 11 April 2012 through 13 April 2012
ER -