Finding co-occurring topics in wikipedia article segments

Renzhi Wang*, Jianmin Wu, Mizuho Iwaihara

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)


Wikipedia is the largest online encyclopedia, in which articles form knowledgeable and semantic resources. Identical topics in different articles indicate that the articles are related to each other about topics. Finding such co-occurring topics is useful to improve the accuracy of querying and clustering, and also to contrast related articles. Existing topic alignment work and topic relevance detection are based on term occurrence. In our research, we discuss incorporating latent topics existing in article segments by utilizing Latent Dirichlet Allocation (LDA), to detect topic relevance. We also study how segment proximities, arising from segment ordering and hyperlinks, shall be incorporated into topic detection and alignment. Experimental data show our method can find and distinguish three types of co-occurrence.

Original languageEnglish
Title of host publicationThe Emergence of Digital Libraries - Research and Practices - 16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014, Proceedings
EditorsAdam Jatowt, Edie Rasmussen, Kulthida Tuamsuk
PublisherSpringer Verlag
Number of pages8
ISBN (Electronic)9783319128221
Publication statusPublished - 2014 Jan 1
Event16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014 - Chiang Mai, Thailand
Duration: 2014 Nov 52014 Nov 7

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014
CityChiang Mai


  • LDA
  • Link
  • MLE
  • Wikipedia

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Finding co-occurring topics in wikipedia article segments'. Together they form a unique fingerprint.

Cite this