TY - GEN
T1 - Scientific keyphrase extraction
T2 - 17th China National Conference on Computational Linguistics, CCL 2018 and 6th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2018
AU - Liu, Qianying
AU - Kawahara, Daisuke
AU - Li, Sujian
N1 - Funding Information:
Acknowledgement. We thank the anonymous reviewers for their insightful comments on this paper. This work was partially supported by National Natural Science Foundation of China (61572049 and 61273278).
Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - Keyphrase extraction can provide effective ways of organizing scientific documents. For this task, neural-based methods usually suffer from performance unstability due to data scarcity. In this paper, we adopt the pipeline two-step method including candidate extraction and keyphrase ranking, where candidate extraction is a key to influence the whole performance. In the candidate extraction step, to overcome the low-recall problem of traditional rule-based method, we propose a novel semi-supervised data augmentation method, where a neural-based tagging model and a discriminative classifier boost each other and get more confident phrases as candidates. With more reasonable candidates, keyphrase are identified with recall promoted. Experiments on SemEval 2017 Task 10 show that our model can achieve competitive results.
AB - Keyphrase extraction can provide effective ways of organizing scientific documents. For this task, neural-based methods usually suffer from performance unstability due to data scarcity. In this paper, we adopt the pipeline two-step method including candidate extraction and keyphrase ranking, where candidate extraction is a key to influence the whole performance. In the candidate extraction step, to overcome the low-recall problem of traditional rule-based method, we propose a novel semi-supervised data augmentation method, where a neural-based tagging model and a discriminative classifier boost each other and get more confident phrases as candidates. With more reasonable candidates, keyphrase are identified with recall promoted. Experiments on SemEval 2017 Task 10 show that our model can achieve competitive results.
KW - Keyphrase extraction
KW - Neural networks
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85055441037&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055441037&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01716-3_16
DO - 10.1007/978-3-030-01716-3_16
M3 - Conference contribution
AN - SCOPUS:85055441037
SN - 9783030017156
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 183
EP - 194
BT - Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 17th China National Conference, CCL 2018, and 6th International Symposium, NLP-NABD 2018, Proceedings
A2 - Wang, Xiaojie
A2 - Liu, Ting
A2 - Sun, Maosong
A2 - Liu, Zhiyuan
A2 - Liu, Yang
PB - Springer Verlag
Y2 - 19 October 2018 through 21 October 2018
ER -