TY - GEN
T1 - Extraction of lexical bundles used in natural language processing articles
AU - Goh, Chooi Ling
AU - Lepage, Yves
N1 - Funding Information:
ACKNOWLEDGMENT This work was supported by JSPS KAKENHI Grant Number JP18K11446.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Lexical bundles are indispensable for fluent academic writing. They might not constitute complete structural units but they occur very frequently in academic conversations, conference presentations and scientific articles. This paper shows how to collect a large database of lexical bundles from articles in the Natural Language Processing (NLP) domain. We first collect highly frequent N-grams from the ACL-ARC collection of NLP articles and then classify them into true or false lexical bundles using machine learning models trained from a set of manually checked bundles. In a verification experiment, our best model achieves an accuracy of 76 %. Using this model, we extract more than 18,000 lexical bundles from the ACL-ARC corpus, which we publicly release.
AB - Lexical bundles are indispensable for fluent academic writing. They might not constitute complete structural units but they occur very frequently in academic conversations, conference presentations and scientific articles. This paper shows how to collect a large database of lexical bundles from articles in the Natural Language Processing (NLP) domain. We first collect highly frequent N-grams from the ACL-ARC collection of NLP articles and then classify them into true or false lexical bundles using machine learning models trained from a set of manually checked bundles. In a verification experiment, our best model achieves an accuracy of 76 %. Using this model, we extract more than 18,000 lexical bundles from the ACL-ARC corpus, which we publicly release.
UR - http://www.scopus.com/inward/record.url?scp=85081090863&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081090863&partnerID=8YFLogxK
U2 - 10.1109/ICACSIS47736.2019.8979950
DO - 10.1109/ICACSIS47736.2019.8979950
M3 - Conference contribution
AN - SCOPUS:85081090863
T3 - 2019 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019
SP - 223
EP - 228
BT - 2019 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 11th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2019
Y2 - 12 October 2019 through 13 October 2019
ER -