Exploiting parallel corpus for handling out-of-vocabulary words

Juan Luo, John Tinsley, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper presents a hybrid model for handling out-of-vocabulary words in Japanese to- English statistical machine translation output by exploiting parallel corpus. As the Japanese writing system makes use of four different script sets (kanji, hiragana, katakana, and romaji), we treat these scripts differently. A machine transliteration model is built to transliterate out-of vocabulary Japanese katakana words into English words. A Japanese dependency structure analyzer is employed to tackle out of-vocabulary kanji and hiragana words. The evaluation results demonstrate that it is an effective approach for addressing out-of vocabulary word problems and decreasing the OOVs rate in the Japanese-to-English machine translation tasks.

Original languageEnglish
Title of host publication27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27
PublisherNational Chengchi University
Pages399-408
Number of pages10
ISBN (Electronic)9789860385670
Publication statusPublished - 2013
Event27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013 - Taipei, Taiwan, Province of China
Duration: 2013 Nov 212013 Nov 24

Publication series

Name27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27

Conference

Conference27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013
Country/TerritoryTaiwan, Province of China
CityTaipei
Period13/11/2113/11/24

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Exploiting parallel corpus for handling out-of-vocabulary words'. Together they form a unique fingerprint.

Cite this