TY - JOUR
T1 - Flexible Pseudo-Relevance Feedback via Selective Sampling
AU - Sakai, Tetsuya
AU - Manabe, Toshihiko
AU - Koyama, Makoto
PY - 2005/6/1
Y1 - 2005/6/1
N2 - Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.
AB - Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.
KW - Experimentation
KW - Performance
KW - Pseudo-relevance feedback
KW - flexible pseudo-relevance feedback
KW - selective sampling
UR - http://www.scopus.com/inward/record.url?scp=33750320351&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33750320351&partnerID=8YFLogxK
U2 - 10.1145/1105696.1105699
DO - 10.1145/1105696.1105699
M3 - Article
AN - SCOPUS:33750320351
SN - 1530-0226
VL - 4
SP - 111
EP - 135
JO - ACM Transactions on Asian Language Information Processing
JF - ACM Transactions on Asian Language Information Processing
IS - 2
ER -