Scientific keyphrase extraction: Extracting candidates with semi-supervised data augmentation

Qianying Liu, Daisuke Kawahara, Sujian Li*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Keyphrase extraction can provide effective ways of organizing scientific documents. For this task, neural-based methods usually suffer from performance unstability due to data scarcity. In this paper, we adopt the pipeline two-step method including candidate extraction and keyphrase ranking, where candidate extraction is a key to influence the whole performance. In the candidate extraction step, to overcome the low-recall problem of traditional rule-based method, we propose a novel semi-supervised data augmentation method, where a neural-based tagging model and a discriminative classifier boost each other and get more confident phrases as candidates. With more reasonable candidates, keyphrase are identified with recall promoted. Experiments on SemEval 2017 Task 10 show that our model can achieve competitive results.

Original languageEnglish
Title of host publicationChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 17th China National Conference, CCL 2018, and 6th International Symposium, NLP-NABD 2018, Proceedings
EditorsXiaojie Wang, Ting Liu, Maosong Sun, Zhiyuan Liu, Yang Liu
PublisherSpringer Verlag
Pages183-194
Number of pages12
ISBN (Print)9783030017156
DOIs
Publication statusPublished - 2018
Externally publishedYes
Event17th China National Conference on Computational Linguistics, CCL 2018 and 6th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2018 - Changsha, China
Duration: 2018 Oct 192018 Oct 21

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11221 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th China National Conference on Computational Linguistics, CCL 2018 and 6th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2018
Country/TerritoryChina
CityChangsha
Period18/10/1918/10/21

Keywords

  • Keyphrase extraction
  • Neural networks
  • Semi-supervised learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Scientific keyphrase extraction: Extracting candidates with semi-supervised data augmentation'. Together they form a unique fingerprint.

Cite this