Weakly-Supervised Neural Categorization of Wikipedia Articles

Xingyu Chen, Mizuho Iwaihara*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep neural models are gaining increasing popularity for many NLP tasks, due to their strong expressive power and less requirement for feature engineering. Neural models often need a large amount of labeled training documents. However, one category of Wikipedia does not contain enough articles for training. Weakly-supervised neural document classification can deal with situations even when only a small labeled document set is given. However, these RNN-based approaches often fail on long documents such as Wikipedia articles, due to hardness to retain memories on important parts of a long document. To overcome these challenges, we propose a text summarization method called WS-Rank, which extracts key sentences of documents with weighting based on class-related keywords and sentence positions in documents. After applying our WS-Rank to training and test documents to summarize then into key sentences, weakly-supervised neural classification shows remarkable improvement on classification results.

Original languageEnglish
Title of host publicationDigital Libraries at the Crossroads of Digital Information for the Future - 21st International Conference on Asia-Pacific Digital Libraries, ICADL 2019, Proceedings
EditorsAdam Jatowt, Akira Maeda, Sue Yeon Syn
PublisherSpringer
Pages16-22
Number of pages7
ISBN (Print)9783030340575
DOIs
Publication statusPublished - 2019
Event21st International Conference on Asia-Pacific Digital Libraries, ICADL 2019 - Kuala Lumpur, Malaysia
Duration: 2019 Nov 42019 Nov 7

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11853 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Asia-Pacific Digital Libraries, ICADL 2019
Country/TerritoryMalaysia
CityKuala Lumpur
Period19/11/419/11/7

Keywords

  • Hierarchical category structure
  • Neural classification
  • Text classification
  • Weakly-supervised learning
  • Wikipedia category

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Weakly-Supervised Neural Categorization of Wikipedia Articles'. Together they form a unique fingerprint.

Cite this