Hybrid approach for Khmer unknown word POS guessing

Chenda Nou*, Wataru Kameyama

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

New words are being created everyday and the lexicon is not large enough to cover all the words, unknown words become a serious problem in part-of-speech tagging. This paper presents a hybrid approach to handle the unknown word problem in Khmer part-of-speech tagging. The hybrid approach combined of rule-based model and trigram model makes use of both internal structure of the word and surrounding contextual information to predict the part-of-speech of unknown words. The proposed approach achieves 88.9% and 78.2% of accuracy on training and test data respectively.

Original languageEnglish
Title of host publication2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
Pages215-220
Number of pages6
DOIs
Publication statusPublished - 2007
Event2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007 - Las Vegas, NV, United States
Duration: 2007 Aug 132007 Aug 15

Publication series

Name2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007

Conference

Conference2007 IEEE International Conference on Information Reuse and Integration, IEEE IRI-2007
Country/TerritoryUnited States
CityLas Vegas, NV
Period07/8/1307/8/15

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Hybrid approach for Khmer unknown word POS guessing'. Together they form a unique fingerprint.

Cite this