TY - GEN
T1 - What is your Mother Tongue?
T2 - 2016 IEEE International Conference on Big Data Analysis, ICBDA 2016
AU - Wang, Lan
AU - Tanaka, Masahiro
AU - Yamana, Hayato
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/12
Y1 - 2016/7/12
N2 - Native language identification (NLI) is a process by which an author's native language can be identified from essays written in the second language of the author. In this work, a supervised model is built to accomplish this based on a Chinese learner corpus. In the NLI field, this is the first work to (1) eliminate noisy data automatically before the training phase and (2) employ a BM25 term weighting technique to score each feature. We also adopt a hierarchical structure of linear support vector machine classifiers to achieve high accuracy and a state-of-the-art accuracy of 77.1%, which is greater than those of other Chinese NLI methods by over 10%.
AB - Native language identification (NLI) is a process by which an author's native language can be identified from essays written in the second language of the author. In this work, a supervised model is built to accomplish this based on a Chinese learner corpus. In the NLI field, this is the first work to (1) eliminate noisy data automatically before the training phase and (2) employ a BM25 term weighting technique to score each feature. We also adopt a hierarchical structure of linear support vector machine classifiers to achieve high accuracy and a state-of-the-art accuracy of 77.1%, which is greater than those of other Chinese NLI methods by over 10%.
KW - author profiling
KW - machine learning
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=84981333042&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84981333042&partnerID=8YFLogxK
U2 - 10.1109/ICBDA.2016.7509793
DO - 10.1109/ICBDA.2016.7509793
M3 - Conference contribution
AN - SCOPUS:84981333042
T3 - Proceedings of 2016 IEEE International Conference on Big Data Analysis, ICBDA 2016
BT - Proceedings of 2016 IEEE International Conference on Big Data Analysis, ICBDA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 March 2016 through 14 March 2016
ER -