Unsupervised Keyphrase Generation by Utilizing Masked Words Prediction and Pseudo-label BART Finetuning

Yingchao Ju, Mizuho Iwaihara*

*この研究の対応する著者

研究成果: Conference contribution

抄録

A keyphrase is a short phrase of one or a few words that summarizes the key idea discussed in the document. Keyphrase generation is the process of predicting both present and absent keyphrases from a given document. Recent studies based on sequence-to-sequence (Seq2Seq) deep learning framework have been widely used in keyphrase generation. However, the excellent performance of these models on the keyphrase generation task is acquired at the expense of a large quantity of annotated documents. In this paper, we propose an unsupervised method called MLMPBKG, based on masked language model (MLM) and pseudo-label BART finetuning. We mask noun phrases in the article, and apply MLM to predict replaceable words. We observe that absent keyphrases can be found in these words. Based on the observation, we first propose MLMKPG, which utilizes MLM to generate keyphrase candidates and use a sentence embedding model to rank the candidate phrases. Furthermore, we use these top-ranked phrases as pseudo-labels to finetune BART for obtaining more absent keyphrases. Experimental results show that our method achieves remarkable results on both present and abstract keyphrase predictions, even surpassing supervised baselines in certain cases.

本文言語English
ホスト出版物のタイトルFrom Born-Physical to Born-Virtual
ホスト出版物のサブタイトルAugmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings
編集者Yuen-Hsien Tseng, Marie Katsurai, Hoa N. Nguyen
出版社Springer Science and Business Media Deutschland GmbH
ページ21-34
ページ数14
ISBN(印刷版)9783031217555
DOI
出版ステータスPublished - 2022
イベント24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022 - Hanoi, Viet Nam
継続期間: 2022 11月 302022 12月 2

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13636 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022
国/地域Viet Nam
CityHanoi
Period22/11/3022/12/2

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータサイエンス一般

フィンガープリント

「Unsupervised Keyphrase Generation by Utilizing Masked Words Prediction and Pseudo-label BART Finetuning」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル