TY - GEN

T1 - A Fast and Exact Randomisation Test for Comparing Two Systems with Paired Data

AU - Suzuki, Rikiya

AU - Sakai, Tetsuya

N1 - Publisher Copyright:
© 2021 ACM.

PY - 2021/7/11

Y1 - 2021/7/11

N2 - The randomisation test with B trials has been used in the information retrieval (IR) field for comparing two systems with paired data (i.e., a common set of topics). It approximates the exact randomisation test whose time complexity is O(2n) for n topics. In this paper, we first show that the counting operation for obtaining the exact p-value in a randomisation test can be reduced to a subsequence sum enumeration problem that can be solved by dynamic programming. By taking advantage of this observation along with the fact that we only require a small number of significant digits in IR evaluation measure scores, we demonstrate that the time complexity of the exact randomisation test can be reduced to O(mn), where m is the maximum subsequence sum of the paired score differences. Hence, researchers can utilise our test if they want to avoid random sampling and/or normality assumptions and want a fast and reliable test. In addition, we utilise Vandermonde's convolution in order to theoretically explain a known fact, namely, that the sign test is a special case of the exact randomisation test.

AB - The randomisation test with B trials has been used in the information retrieval (IR) field for comparing two systems with paired data (i.e., a common set of topics). It approximates the exact randomisation test whose time complexity is O(2n) for n topics. In this paper, we first show that the counting operation for obtaining the exact p-value in a randomisation test can be reduced to a subsequence sum enumeration problem that can be solved by dynamic programming. By taking advantage of this observation along with the fact that we only require a small number of significant digits in IR evaluation measure scores, we demonstrate that the time complexity of the exact randomisation test can be reduced to O(mn), where m is the maximum subsequence sum of the paired score differences. Hence, researchers can utilise our test if they want to avoid random sampling and/or normality assumptions and want a fast and reliable test. In addition, we utilise Vandermonde's convolution in order to theoretically explain a known fact, namely, that the sign test is a special case of the exact randomisation test.

KW - permutation test

KW - randomization test

KW - sign test

KW - statistical significance

UR - http://www.scopus.com/inward/record.url?scp=85114449041&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85114449041&partnerID=8YFLogxK

U2 - 10.1145/3471158.3472228

DO - 10.1145/3471158.3472228

M3 - Conference contribution

AN - SCOPUS:85114449041

T3 - ICTIR 2021 - Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval

SP - 239

EP - 243

BT - ICTIR 2021 - Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval

PB - Association for Computing Machinery, Inc

T2 - 11th ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2021

Y2 - 11 July 2021

ER -