Improved Chinese-Japanese phrase-based MT quality using an extended quasi-parallel corpus

Hao Wang, Wei Yang, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

State-of-the-art phrase-based machine translation (MT) systems usually demand large parallel corpora in the step of training. The quality and the quantity of the training data exert a direct influence on the performance of such translation systems. The lack of open-source bilingual corpora for a particular language pair results in lower translation scores reported for such a language pair. This is the case of Chinese-Japanese. In this paper, we propose to build an extension of an initial parallel corpus in the form of quasi-parallel sentences, instead of adding new parallel sentences. The extension of the initial corpus is obtained by using monolingual analogical associations. Our experiments show that the use of such quasi-parallel corpora improves the performance of Chinese-Japanese translation systems.

Original languageEnglish
Title of host publicationPIC 2014 - Proceedings of 2014 IEEE International Conference on Progress in Informatics and Computing
EditorsYinglin Wang, Xuelong Li, Hongming Cai
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6-10
Number of pages5
ISBN (Electronic)9781479920334
DOIs
Publication statusPublished - 2014 Dec 2
Event2014 2nd IEEE International Conference on Progress in Informatics and Computing, PIC 2014 - Shanghai, China
Duration: 2014 May 162014 May 18

Publication series

NamePIC 2014 - Proceedings of 2014 IEEE International Conference on Progress in Informatics and Computing

Conference

Conference2014 2nd IEEE International Conference on Progress in Informatics and Computing, PIC 2014
Country/TerritoryChina
CityShanghai
Period14/5/1614/5/18

Keywords

  • analogy
  • machine translation
  • paraphrasing
  • quasi-parallel data

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Improved Chinese-Japanese phrase-based MT quality using an extended quasi-parallel corpus'. Together they form a unique fingerprint.

Cite this