Two sample T-tests for IR evaluation: Student or welch?

Tetsuya Sakai*

*この研究の対応する著者

研究成果: Conference contribution

17 被引用数 (Scopus)

抄録

There are two well-known versions of the t-test for comparing means from unpaired data: Student's t-test and Welch's t-test. While Welch's t-test does not assume homoscedasticity (i.e., equal variances), it involves approximations. A classical textbook recommendation would be to use Student's t-test if either the two sample sizes are similar or the two sample variances are similar, and to use Welch's t-test only when both of the above conditions are violated. However, a more recent recommendation seems to be to use Welch's t-test unconditionally. Using past data from both TREC and NTCIR, the present study demonstrates that the latter advice should not be followed blindly in the context of IR system evaluation. More specifically, our results suggest that if the sample sizes differ substantially and if the larger sample has a substantially larger variance, Welch's t-test may not be reliable.

本文言語English
ホスト出版物のタイトルSIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval
出版社Association for Computing Machinery, Inc
ページ1045-1048
ページ数4
ISBN(電子版)9781450342902
DOI
出版ステータスPublished - 2016 7月 7
イベント39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016 - Pisa, Italy
継続期間: 2016 7月 172016 7月 21

出版物シリーズ

名前SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016
国/地域Italy
CityPisa
Period16/7/1716/7/21

ASJC Scopus subject areas

  • 情報システム
  • ソフトウェア

フィンガープリント

「Two sample T-tests for IR evaluation: Student or welch?」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル