TY - GEN
T1 - Detection of Editing Bursts and Extraction of Significant Keyphrases from Wikipedia Edit History
AU - Chen, Zihang
AU - Iwaihara, Mizuho
N1 - Publisher Copyright:
© 2021, Springer Nature Singapore Pte Ltd.
PY - 2021
Y1 - 2021
N2 - In an online collaboration system such as Wikipedia, edit history is stored as revisions. Topics of articles or categories grow and fade over time, and evolutionary information is retained in its edit history. We consider that a great amount of information that is related to real life events is embedded in such edit history of documents. This paper focuses on a particular temporal text mining task: effectively extracting keyphrases from burst periods in the edit history of Wikipedia articles or category. We first combine the ARIMA model with a decay function to find typical edit burst periods, then perform keyphrase extraction on burst periods to reveal topics of bursts. However, keyphrase extraction methods, such as TextRank, do not consider temporal trends in text stream. In this paper, we propose TextRank_nfidf which reflects temporal trends into phrase node weights, by computing smoothed difference of editing frequency between revisions. We confirm that detected bursts and keyphrases are matching well with events along the timeline.
AB - In an online collaboration system such as Wikipedia, edit history is stored as revisions. Topics of articles or categories grow and fade over time, and evolutionary information is retained in its edit history. We consider that a great amount of information that is related to real life events is embedded in such edit history of documents. This paper focuses on a particular temporal text mining task: effectively extracting keyphrases from burst periods in the edit history of Wikipedia articles or category. We first combine the ARIMA model with a decay function to find typical edit burst periods, then perform keyphrase extraction on burst periods to reveal topics of bursts. However, keyphrase extraction methods, such as TextRank, do not consider temporal trends in text stream. In this paper, we propose TextRank_nfidf which reflects temporal trends into phrase node weights, by computing smoothed difference of editing frequency between revisions. We confirm that detected bursts and keyphrases are matching well with events along the timeline.
KW - Burst detection
KW - Edit history
KW - Keywords extraction
KW - TextRank
UR - http://www.scopus.com/inward/record.url?scp=85091599534&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091599534&partnerID=8YFLogxK
U2 - 10.1007/978-981-15-8731-3_4
DO - 10.1007/978-981-15-8731-3_4
M3 - Conference contribution
AN - SCOPUS:85091599534
SN - 9789811587306
T3 - Advances in Intelligent Systems and Computing
SP - 45
EP - 65
BT - Big Data Analyses, Services, and Smart Data, BigDAS 2018
A2 - Lee, Wookey
A2 - Leung, Carson K.
A2 - Nasridinov, Aziz
PB - Springer Science and Business Media Deutschland GmbH
T2 - 6th International Conference on Big Data Applications and Services, BigDAS 2018
Y2 - 19 August 2018 through 22 August 2018
ER -