IRT-based aggregation model of crowdsourced pairwise comparisons for evaluating machine translations

Naoki Otani, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

研究成果: Conference contribution

7 被引用数 (Scopus)

抄録

Recent work on machine translation has used crowdsourcing to reduce costs of manual evaluations. However, crowdsourced judgments are often biased and inaccurate. In this paper, we present a statistical model that aggregates many manual pairwise comparisons to robustly measure a machine translation system's performance. Our method applies graded response model from item response theory (IRT), which was originally developed for academic tests. We conducted experiments on a public dataset from the Workshop on Statistical Machine Translation 2013, and found that our approach resulted in highly interpretable estimates and was less affected by noisy judges than previously proposed methods.

本文言語English
ホスト出版物のタイトルEMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings
出版社Association for Computational Linguistics (ACL)
ページ511-520
ページ数10
ISBN(電子版)9781945626258
出版ステータスPublished - 2016
外部発表はい
イベント2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016 - Austin, United States
継続期間: 2016 11月 12016 11月 5

出版物シリーズ

名前EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016
国/地域United States
CityAustin
Period16/11/116/11/5

ASJC Scopus subject areas

  • コンピュータ サイエンスの応用
  • 情報システム
  • 計算理論と計算数学

フィンガープリント

「IRT-based aggregation model of crowdsourced pairwise comparisons for evaluating machine translations」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル