TY - GEN
T1 - Hybrid approach for Khmer unknown word POS guessing
AU - Nou, Chenda
AU - Kameyama, Wataru
PY - 2007
Y1 - 2007
N2 - New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.
AB - New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.
UR - http://www.scopus.com/inward/record.url?scp=47949109029&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47949109029&partnerID=8YFLogxK
U2 - 10.1109/IRI.2007.4296623
DO - 10.1109/IRI.2007.4296623
M3 - Conference contribution
AN - SCOPUS:47949109029
SN - 1424414997
SN - 9781424414994
T3 - 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
SP - 215
EP - 220
BT - 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
T2 - 2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
Y2 - 13 August 2007 through 15 August 2007
ER -