TY - GEN
T1 - Exploiting parallel corpus for handling out-of-vocabulary words
AU - Luo, Juan
AU - Tinsley, John
AU - Lepage, Yves
N1 - Publisher Copyright:
© 2013 by Juan Luo, John Tinsley, and Yves Lepage.
PY - 2013
Y1 - 2013
N2 - This paper presents a hybrid model for handling out-of-vocabulary words in Japanese to- English statistical machine translation output by exploiting parallel corpus. As the Japanese writing system makes use of four different script sets (kanji, hiragana, katakana, and romaji), we treat these scripts differently. A machine transliteration model is built to transliterate out-of vocabulary Japanese katakana words into English words. A Japanese dependency structure analyzer is employed to tackle out of-vocabulary kanji and hiragana words. The evaluation results demonstrate that it is an effective approach for addressing out-of vocabulary word problems and decreasing the OOVs rate in the Japanese-to-English machine translation tasks.
AB - This paper presents a hybrid model for handling out-of-vocabulary words in Japanese to- English statistical machine translation output by exploiting parallel corpus. As the Japanese writing system makes use of four different script sets (kanji, hiragana, katakana, and romaji), we treat these scripts differently. A machine transliteration model is built to transliterate out-of vocabulary Japanese katakana words into English words. A Japanese dependency structure analyzer is employed to tackle out of-vocabulary kanji and hiragana words. The evaluation results demonstrate that it is an effective approach for addressing out-of vocabulary word problems and decreasing the OOVs rate in the Japanese-to-English machine translation tasks.
UR - http://www.scopus.com/inward/record.url?scp=84922792037&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84922792037&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84922792037
T3 - 27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27
SP - 399
EP - 408
BT - 27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 27
PB - National Chengchi University
T2 - 27th Pacific Asia Conference on Language, Information, and Computation, PACLIC 2013
Y2 - 21 November 2013 through 24 November 2013
ER -