TY - GEN
T1 - Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization
AU - Liu, Tingyi
AU - Iwaihara, Mizuho
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Keyphrase extraction is the task of selecting a set of phrases that can best represent a given document. Keyphrase extraction is utilized in document indexing and categorization, thus being one of core technologies of digital libraries. Supervised keyphrase extraction based on pretrained language models are advantageous thorough their contextualized text representations. In this paper, we show an adaptation of the pertained language model BERT to keyphrase extraction, called BERT Keyphrase-Rank (BK-Rank), based on a cross-encoder architecture. However, the accuracy of BK-Rank alone is suffering when documents contain a large amount of candidate phrases, especially in long documents. Based on the notion that keyphrases are more likely to occur in representative sentences of the document, we propose a new approach called Keyphrase-Focused BERT Summarization (KFBS), which extracts important sentences as a summary, from which BK-Rank can more easily find keyphrases. Training of KFBS is by distant supervision such that sentences lexically similar to the keyphrase set are chosen as positive samples. Our experimental results show that the combination of KFBS + BK-Rank show superior performance over the compared baseline methods on well-known four benchmark collections, especially on long documents.
AB - Keyphrase extraction is the task of selecting a set of phrases that can best represent a given document. Keyphrase extraction is utilized in document indexing and categorization, thus being one of core technologies of digital libraries. Supervised keyphrase extraction based on pretrained language models are advantageous thorough their contextualized text representations. In this paper, we show an adaptation of the pertained language model BERT to keyphrase extraction, called BERT Keyphrase-Rank (BK-Rank), based on a cross-encoder architecture. However, the accuracy of BK-Rank alone is suffering when documents contain a large amount of candidate phrases, especially in long documents. Based on the notion that keyphrases are more likely to occur in representative sentences of the document, we propose a new approach called Keyphrase-Focused BERT Summarization (KFBS), which extracts important sentences as a summary, from which BK-Rank can more easily find keyphrases. Training of KFBS is by distant supervision such that sentences lexically similar to the keyphrase set are chosen as positive samples. Our experimental results show that the combination of KFBS + BK-Rank show superior performance over the compared baseline methods on well-known four benchmark collections, especially on long documents.
KW - Document indexing
KW - Extractive summarization
KW - Keyphrase extraction
KW - Pretrained language model
KW - Supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85121932982&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121932982&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-91669-5_13
DO - 10.1007/978-3-030-91669-5_13
M3 - Conference contribution
AN - SCOPUS:85121932982
SN - 9783030916688
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 157
EP - 166
BT - Towards Open and Trustworthy Digital Societies - 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings
A2 - Ke, Hao-Ren
A2 - Lee, Chei Sian
A2 - Sugiyama, Kazunari
PB - Springer Science and Business Media Deutschland GmbH
T2 - 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021
Y2 - 1 December 2021 through 3 December 2021
ER -