TY - GEN
T1 - IRT-based aggregation model of crowdsourced pairwise comparisons for evaluating machine translations
AU - Otani, Naoki
AU - Nakazawa, Toshiaki
AU - Kawahara, Daisuke
AU - Kurohashi, Sadao
N1 - Publisher Copyright:
© 2016 Association for Computational Linguistics
PY - 2016
Y1 - 2016
N2 - Recent work on machine translation has used crowdsourcing to reduce costs of manual evaluations. However, crowdsourced judgments are often biased and inaccurate. In this paper, we present a statistical model that aggregates many manual pairwise comparisons to robustly measure a machine translation system's performance. Our method applies graded response model from item response theory (IRT), which was originally developed for academic tests. We conducted experiments on a public dataset from the Workshop on Statistical Machine Translation 2013, and found that our approach resulted in highly interpretable estimates and was less affected by noisy judges than previously proposed methods.
AB - Recent work on machine translation has used crowdsourcing to reduce costs of manual evaluations. However, crowdsourced judgments are often biased and inaccurate. In this paper, we present a statistical model that aggregates many manual pairwise comparisons to robustly measure a machine translation system's performance. Our method applies graded response model from item response theory (IRT), which was originally developed for academic tests. We conducted experiments on a public dataset from the Workshop on Statistical Machine Translation 2013, and found that our approach resulted in highly interpretable estimates and was less affected by noisy judges than previously proposed methods.
UR - http://www.scopus.com/inward/record.url?scp=85072845881&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072845881&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85072845881
T3 - EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 511
EP - 520
BT - EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings
PB - Association for Computational Linguistics (ACL)
T2 - 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016
Y2 - 1 November 2016 through 5 November 2016
ER -