Detection of Editing Bursts and Extraction of Significant Keyphrases from Wikipedia Edit History

Zihang Chen, Mizuho Iwaihara*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In an online collaboration system such as Wikipedia, edit history is stored as revisions. Topics of articles or categories grow and fade over time, and evolutionary information is retained in its edit history. We consider that a great amount of information that is related to real life events is embedded in such edit history of documents. This paper focuses on a particular temporal text mining task: effectively extracting keyphrases from burst periods in the edit history of Wikipedia articles or category. We first combine the ARIMA model with a decay function to find typical edit burst periods, then perform keyphrase extraction on burst periods to reveal topics of bursts. However, keyphrase extraction methods, such as TextRank, do not consider temporal trends in text stream. In this paper, we propose TextRank_nfidf which reflects temporal trends into phrase node weights, by computing smoothed difference of editing frequency between revisions. We confirm that detected bursts and keyphrases are matching well with events along the timeline.

Original languageEnglish
Title of host publicationBig Data Analyses, Services, and Smart Data, BigDAS 2018
EditorsWookey Lee, Carson K. Leung, Aziz Nasridinov
PublisherSpringer Science and Business Media Deutschland GmbH
Pages45-65
Number of pages21
ISBN (Print)9789811587306
DOIs
Publication statusPublished - 2021
Event6th International Conference on Big Data Applications and Services, BigDAS 2018 - Zhengzhou, China
Duration: 2018 Aug 192018 Aug 22

Publication series

NameAdvances in Intelligent Systems and Computing
Volume899 AISC
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365

Conference

Conference6th International Conference on Big Data Applications and Services, BigDAS 2018
Country/TerritoryChina
CityZhengzhou
Period18/8/1918/8/22

Keywords

  • Burst detection
  • Edit history
  • Keywords extraction
  • TextRank

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Detection of Editing Bursts and Extraction of Significant Keyphrases from Wikipedia Edit History'. Together they form a unique fingerprint.

Cite this