TY - GEN
T1 - Extraction of bilingual technical terms for chinese-japanese patent translation
AU - Yang, Wei
AU - Yan, Jinghui
AU - Lepage, Yves
N1 - Publisher Copyright:
© 2016 HLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop. All rights reserved.
PY - 2016
Y1 - 2016
N2 - The translation of patents or scientific papers is a key issue that should be helped by the use of statistical machine translation (SMT). In this paper, we propose a method to improve Chinese-Japanese patent SMT by premarking the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic filtering methods. We use the sampling-based alignment method to identify aligned terms and set some threshold on translation probabilities to select the most promising bilingual multi-word terms. We pre-mark a Chinese- Japanese training corpus with such selected aligned bilingual multi-word terms. We obtain the performance of over 70% precision in bilingual term extraction and a significant improvement of BLEU scores in our experiments on a Chinese-Japanese patent parallel corpus.
AB - The translation of patents or scientific papers is a key issue that should be helped by the use of statistical machine translation (SMT). In this paper, we propose a method to improve Chinese-Japanese patent SMT by premarking the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic filtering methods. We use the sampling-based alignment method to identify aligned terms and set some threshold on translation probabilities to select the most promising bilingual multi-word terms. We pre-mark a Chinese- Japanese training corpus with such selected aligned bilingual multi-word terms. We obtain the performance of over 70% precision in bilingual term extraction and a significant improvement of BLEU scores in our experiments on a Chinese-Japanese patent parallel corpus.
UR - http://www.scopus.com/inward/record.url?scp=85077816438&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077816438&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85077816438
T3 - HLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop
SP - 81
EP - 87
BT - HLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics
A2 - Andreas, Jacob
A2 - Choi, Eunsol
A2 - Lazaridou, Angeliki
PB - Association for Computational Linguistics (ACL)
T2 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT-NAACL 2016
Y2 - 12 June 2016 through 17 June 2016
ER -