Exploiting parallel corpus for handling out-of-vocabulary words

Juan Luo, John Tinsley, Yves Lepage

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

This paper presents a hybrid model for handling out-of-vocabulary words in Japanese to- English statistical machine translation output by exploiting parallel corpus. As the Japanese writing system makes use of four different script sets (kanji, hiragana, katakana, and romaji), we treat these scripts differently. A machine transliteration model is built to transliterate out-of vocabulary Japanese katakana words into English words. A Japanese dependency structure analyzer is employed to tackle out of-vocabulary kanji and hiragana words. The evaluation results demonstrate that it is an effective approach for addressing out-of vocabulary word problems and decreasing the OOVs rate in the Japanese-to-English machine translation tasks.

本文言語English
ホスト出版物のタイトル27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27
出版社National Chengchi University
ページ399-408
ページ数10
ISBN(電子版)9789860385670
出版ステータスPublished - 2013
イベント27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013 - Taipei, Taiwan, Province of China
継続期間: 2013 11月 212013 11月 24

出版物シリーズ

名前27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27

Conference

Conference27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013
国/地域Taiwan, Province of China
CityTaipei
Period13/11/2113/11/24

ASJC Scopus subject areas

  • 言語および言語学
  • コンピュータ サイエンス(全般)

フィンガープリント

「Exploiting parallel corpus for handling out-of-vocabulary words」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル