TY - GEN
T1 - A Fast and Exact Randomisation Test for Comparing Two Systems with Paired Data
AU - Suzuki, Rikiya
AU - Sakai, Tetsuya
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/7/11
Y1 - 2021/7/11
N2 - The randomisation test with B trials has been used in the information retrieval (IR) field for comparing two systems with paired data (i.e., a common set of topics). It approximates the exact randomisation test whose time complexity is O(2n) for n topics. In this paper, we first show that the counting operation for obtaining the exact p-value in a randomisation test can be reduced to a subsequence sum enumeration problem that can be solved by dynamic programming. By taking advantage of this observation along with the fact that we only require a small number of significant digits in IR evaluation measure scores, we demonstrate that the time complexity of the exact randomisation test can be reduced to O(mn), where m is the maximum subsequence sum of the paired score differences. Hence, researchers can utilise our test if they want to avoid random sampling and/or normality assumptions and want a fast and reliable test. In addition, we utilise Vandermonde's convolution in order to theoretically explain a known fact, namely, that the sign test is a special case of the exact randomisation test.
AB - The randomisation test with B trials has been used in the information retrieval (IR) field for comparing two systems with paired data (i.e., a common set of topics). It approximates the exact randomisation test whose time complexity is O(2n) for n topics. In this paper, we first show that the counting operation for obtaining the exact p-value in a randomisation test can be reduced to a subsequence sum enumeration problem that can be solved by dynamic programming. By taking advantage of this observation along with the fact that we only require a small number of significant digits in IR evaluation measure scores, we demonstrate that the time complexity of the exact randomisation test can be reduced to O(mn), where m is the maximum subsequence sum of the paired score differences. Hence, researchers can utilise our test if they want to avoid random sampling and/or normality assumptions and want a fast and reliable test. In addition, we utilise Vandermonde's convolution in order to theoretically explain a known fact, namely, that the sign test is a special case of the exact randomisation test.
KW - permutation test
KW - randomization test
KW - sign test
KW - statistical significance
UR - http://www.scopus.com/inward/record.url?scp=85114449041&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114449041&partnerID=8YFLogxK
U2 - 10.1145/3471158.3472228
DO - 10.1145/3471158.3472228
M3 - Conference contribution
AN - SCOPUS:85114449041
T3 - ICTIR 2021 - Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval
SP - 239
EP - 243
BT - ICTIR 2021 - Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 11th ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2021
Y2 - 11 July 2021
ER -