A Closer Look at Evaluation Measures for Ordinal Quantification

Tetsuya Sakai*

*この研究の対応する著者

研究成果: Conference article査読

抄録

In his ACL 2021 paper [1], Sakai compared several evaluation measures in the context of Ordinal Quantification (OQ) tasks in terms of system ranking similarity, system ranking consistency (i.e., robustness to the choice of test data), and discriminative power (i.e., ability to find many statistically significant differences). Based on his experimental results, he recommended the use of his RNOD (Root Normalised Order-aware Divergence) measure along with NMD (Normalised Match Distance, i.e., normalised Earth Mover's Distance). The present study follows up on his discriminative power experiments, by taking a much closer look at the statistical significance test results obtained from each evaluation measure. Our new analyses show that (1) RNOD is the overall winner among the OQ measures in terms of pooled discriminative power (i.e., discriminative power across multiple data sets); (2) NMD behaves noticeably differently from RNOD and from measures that cannot handle ordinal classes; (3) NMD tends to favour a popularity-based baseline (which accesses the gold distributions) over a uniform-distribution baseline, thus contradicting the other measures in terms of statistical significance. As both RNOD and NMD have their merits, we recommend the organisers of OQ tasks to use both of them to evaluate the systems from multiple angles.

本文言語English
ジャーナルCEUR Workshop Proceedings
3052
出版ステータスPublished - 2021
イベント2021 International Conference on Information and Knowledge Management Workshops, CIKMW 2021 - Gold Coast, Australia
継続期間: 2021 11月 12021 11月 5

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)

フィンガープリント

「A Closer Look at Evaluation Measures for Ordinal Quantification」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル