Scientific keyphrase extraction: Extracting candidates with semi-supervised data augmentation

Qianying Liu, Daisuke Kawahara, Sujian Li*

*この研究の対応する著者

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

Keyphrase extraction can provide effective ways of organizing scientific documents. For this task, neural-based methods usually suffer from performance unstability due to data scarcity. In this paper, we adopt the pipeline two-step method including candidate extraction and keyphrase ranking, where candidate extraction is a key to influence the whole performance. In the candidate extraction step, to overcome the low-recall problem of traditional rule-based method, we propose a novel semi-supervised data augmentation method, where a neural-based tagging model and a discriminative classifier boost each other and get more confident phrases as candidates. With more reasonable candidates, keyphrase are identified with recall promoted. Experiments on SemEval 2017 Task 10 show that our model can achieve competitive results.

本文言語English
ホスト出版物のタイトルChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 17th China National Conference, CCL 2018, and 6th International Symposium, NLP-NABD 2018, Proceedings
編集者Xiaojie Wang, Ting Liu, Maosong Sun, Zhiyuan Liu, Yang Liu
出版社Springer Verlag
ページ183-194
ページ数12
ISBN(印刷版)9783030017156
DOI
出版ステータスPublished - 2018
外部発表はい
イベント17th China National Conference on Computational Linguistics, CCL 2018 and 6th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2018 - Changsha, China
継続期間: 2018 10月 192018 10月 21

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
11221 LNAI
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference17th China National Conference on Computational Linguistics, CCL 2018 and 6th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2018
国/地域China
CityChangsha
Period18/10/1918/10/21

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Scientific keyphrase extraction: Extracting candidates with semi-supervised data augmentation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル