Improving automatic Chinese–Japanese patent translation using bilingual term extraction

Wei Yang*, Yves Lepage

*この研究の対応する著者

研究成果: Article査読

1 被引用数 (Scopus)

抄録

The identification of terms in scientific and patent documents is a crucial issue for applications like information retrieval, text categorization, and also for machine translation. This paper describes a method to improve Chinese–Japanese statistical machine translation of patents by re-tokenizing the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic filtering methods. An automatic alignment method is used to identify corresponding terms. The most promising bilingual multi-word terms are extracted by setting some threshold on translation probabilities and further filtering by considering the components of the bilingual multi-word terms in characters as well as the ratio of their lengths in words. We also use kanji (Japanese)–hanzi (Chinese) character conversion to confirm and extract more promising bilingual multi-word terms. We obtain a high quality of correspondence with 93% in bilingual term extraction and a significant improvement of 1.5 BLEU score in a translation experiment.

本文言語English
ページ(範囲)117-125
ページ数9
ジャーナルIEEJ Transactions on Electrical and Electronic Engineering
13
1
DOI
出版ステータスPublished - 2018 1月

ASJC Scopus subject areas

  • 電子工学および電気工学

フィンガープリント

「Improving automatic Chinese–Japanese patent translation using bilingual term extraction」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル