TY - GEN
T1 - Finding High Quality Documents through Link and Click Graphs
AU - Yu, Linfeng
AU - Iwaihara, Mizuho
PY - 2019/4/16
Y1 - 2019/4/16
N2 - Link graphs of web pages have been utilized to evaluate importance of each page. Existing link analysis algorithms, including HITS and PageRank, exploit static link connectivity between pages. On the other hand, service providers often record HTTP requests that contain the resource and referrer of each request, from which we can construct a click graph that has edge weights representing the times of clicks on each link, or link traffic. Click graphs reflect users' choices of interesting links, thus the graphs are useful for evaluating importance of pages. However, clicks are often skewed onto highly popular links, so that click graphs only could not properly evaluate less clicked pages. In this paper, we propose an algorithm called click count-weighted HITS algorithm, which integrates HITS algorithm with click graphs, for finding high quality documents. Our evaluations on finding featured articles of English Wikipedia show that our click count-weighted HITS algorithm shows better performance on a large Wikipedia corpus than algorithms that utilize link graphs or click graphs only.
AB - Link graphs of web pages have been utilized to evaluate importance of each page. Existing link analysis algorithms, including HITS and PageRank, exploit static link connectivity between pages. On the other hand, service providers often record HTTP requests that contain the resource and referrer of each request, from which we can construct a click graph that has edge weights representing the times of clicks on each link, or link traffic. Click graphs reflect users' choices of interesting links, thus the graphs are useful for evaluating importance of pages. However, clicks are often skewed onto highly popular links, so that click graphs only could not properly evaluate less clicked pages. In this paper, we propose an algorithm called click count-weighted HITS algorithm, which integrates HITS algorithm with click graphs, for finding high quality documents. Our evaluations on finding featured articles of English Wikipedia show that our click count-weighted HITS algorithm shows better performance on a large Wikipedia corpus than algorithms that utilize link graphs or click graphs only.
KW - Click graph
KW - Document ranking
KW - HITS algorithm
KW - Link analysis
KW - Quality
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=85065214854&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065214854&partnerID=8YFLogxK
U2 - 10.1109/IIAI-AAI.2018.00020
DO - 10.1109/IIAI-AAI.2018.00020
M3 - Conference contribution
AN - SCOPUS:85065214854
T3 - Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
SP - 49
EP - 54
BT - Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
Y2 - 8 July 2018 through 13 July 2018
ER -