Weakly-Supervised Neural Categorization of Wikipedia Articles

Xingyu Chen, Mizuho Iwaihara*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Deep neural models are gaining increasing popularity for many NLP tasks, due to their strong expressive power and less requirement for feature engineering. Neural models often need a large amount of labeled training documents. However, one category of Wikipedia does not contain enough articles for training. Weakly-supervised neural document classification can deal with situations even when only a small labeled document set is given. However, these RNN-based approaches often fail on long documents such as Wikipedia articles, due to hardness to retain memories on important parts of a long document. To overcome these challenges, we propose a text summarization method called WS-Rank, which extracts key sentences of documents with weighting based on class-related keywords and sentence positions in documents. After applying our WS-Rank to training and test documents to summarize then into key sentences, weakly-supervised neural classification shows remarkable improvement on classification results.

Original languageEnglish
Title of host publicationDigital Libraries at the Crossroads of Digital Information for the Future - 21st International Conference on Asia-Pacific Digital Libraries, ICADL 2019, Proceedings
EditorsAdam Jatowt, Akira Maeda, Sue Yeon Syn
Number of pages7
ISBN (Print)9783030340575
Publication statusPublished - 2019
Event21st International Conference on Asia-Pacific Digital Libraries, ICADL 2019 - Kuala Lumpur, Malaysia
Duration: 2019 Nov 42019 Nov 7

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11853 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference21st International Conference on Asia-Pacific Digital Libraries, ICADL 2019
CityKuala Lumpur


  • Hierarchical category structure
  • Neural classification
  • Text classification
  • Weakly-supervised learning
  • Wikipedia category

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Weakly-Supervised Neural Categorization of Wikipedia Articles'. Together they form a unique fingerprint.

Cite this