Self-training involving semantic-space finetuning for semi-supervised multi-label document classification

Zhewei Xu*, Mizuho Iwaihara

*この研究の対応する著者

研究成果: Article査読

抄録

Self-training is an effective solution for semi-supervised learning, in which both labeled and unlabeled data are leveraged for training. However, the application scenarios of existing self-training frameworks are mostly confined to single-label classification. There exist difficulties in applying self-training under multi-label scenario, since unlike single-label classification, there is no constraint of mutual exclusion over categories, and the vast number of possible label vectors makes discovery of credible predictions harder. For realizing effective self-training under multi-label scenario, we propose ML-DST and ML-DST+ that utilize contextualized document representations of pretrained language models. A BERT-based multi-label classifier and newly designed weighted loss functions for finetuning are proposed. Two label propagation-based algorithms SemLPA and SemLPA+ are also proposed to enhance multi-label prediction, whose similarity measure is iteratively improved through semantic-space finetuning, by which semantic space consisting of document representations is finetuned to better reflect learnt label correlations. High-confidence label predictions are recognized through examining the prediction score on each category separately, which are in turn used for both classifier finetuning and semantic-space finetuning. According to our experiment results, the performance of our approach steadily exceeds the representative baselines under different label rates, proving the superiority of our proposed approach.

本文言語English
ページ(範囲)25-39
ページ数15
ジャーナルInternational Journal on Digital Libraries
25
1
DOI
出版ステータスPublished - 2024 3月

ASJC Scopus subject areas

  • 図書館情報学

フィンガープリント

「Self-training involving semantic-space finetuning for semi-supervised multi-label document classification」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル