TY - GEN
T1 - Weakly-Supervised Neural Categorization of Wikipedia Articles
AU - Chen, Xingyu
AU - Iwaihara, Mizuho
N1 - Publisher Copyright:
© Springer Nature Switzerland AG, 2019.
PY - 2019
Y1 - 2019
N2 - Deep neural models are gaining increasing popularity for many NLP tasks, due to their strong expressive power and less requirement for feature engineering. Neural models often need a large amount of labeled training documents. However, one category of Wikipedia does not contain enough articles for training. Weakly-supervised neural document classification can deal with situations even when only a small labeled document set is given. However, these RNN-based approaches often fail on long documents such as Wikipedia articles, due to hardness to retain memories on important parts of a long document. To overcome these challenges, we propose a text summarization method called WS-Rank, which extracts key sentences of documents with weighting based on class-related keywords and sentence positions in documents. After applying our WS-Rank to training and test documents to summarize then into key sentences, weakly-supervised neural classification shows remarkable improvement on classification results.
AB - Deep neural models are gaining increasing popularity for many NLP tasks, due to their strong expressive power and less requirement for feature engineering. Neural models often need a large amount of labeled training documents. However, one category of Wikipedia does not contain enough articles for training. Weakly-supervised neural document classification can deal with situations even when only a small labeled document set is given. However, these RNN-based approaches often fail on long documents such as Wikipedia articles, due to hardness to retain memories on important parts of a long document. To overcome these challenges, we propose a text summarization method called WS-Rank, which extracts key sentences of documents with weighting based on class-related keywords and sentence positions in documents. After applying our WS-Rank to training and test documents to summarize then into key sentences, weakly-supervised neural classification shows remarkable improvement on classification results.
KW - Hierarchical category structure
KW - Neural classification
KW - Text classification
KW - Weakly-supervised learning
KW - Wikipedia category
UR - http://www.scopus.com/inward/record.url?scp=85076387926&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076387926&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-34058-2_2
DO - 10.1007/978-3-030-34058-2_2
M3 - Conference contribution
AN - SCOPUS:85076387926
SN - 9783030340575
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 16
EP - 22
BT - Digital Libraries at the Crossroads of Digital Information for the Future - 21st International Conference on Asia-Pacific Digital Libraries, ICADL 2019, Proceedings
A2 - Jatowt, Adam
A2 - Maeda, Akira
A2 - Syn, Sue Yeon
PB - Springer
T2 - 21st International Conference on Asia-Pacific Digital Libraries, ICADL 2019
Y2 - 4 November 2019 through 7 November 2019
ER -