TY - GEN
T1 - Unsupervised Answer Retrieval with Data Fusion for Community Question Answering
AU - Kato, Sosuke
AU - Shimizu, Toru
AU - Fujita, Sumio
AU - Sakai, Tetsuya
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - Community question answering (cQA) systems have enjoyed the benefits of advances in neural information retrieval, some models of which need annotated documents as supervised data. However, in contrast with the amount of supervised data for cQA systems, user-generated data in cQA sites have been increasing greatly with time. Thus, focusing on unsupervised models, we tackle a task of retrieving relevant answers for new questions from existing cQA data and propose two frameworks to exploit a Question Retrieval (QR) model for Answer Retrieval (AR). The first framework ranks answers according to the combined scores of QR and AR models and the second framework ranks answers using the scores of a QR model and best answer flags. In our experiments, we applied the combination of our proposed frameworks and a classical fusion technique to AR models with a Japanese cQA data set containing approximately 9.4M question-answer pairs. When best answer flags in the cQA data cannot be utilized, our combination of AR and QR scores with data fusion outperforms a base AR model on average. When best answer flags can be utilized, the retrieval performance can be improved further. While our results lack statistical significance, we discuss effect sizes as well as future sample sizes to attain sufficient statistical power.
AB - Community question answering (cQA) systems have enjoyed the benefits of advances in neural information retrieval, some models of which need annotated documents as supervised data. However, in contrast with the amount of supervised data for cQA systems, user-generated data in cQA sites have been increasing greatly with time. Thus, focusing on unsupervised models, we tackle a task of retrieving relevant answers for new questions from existing cQA data and propose two frameworks to exploit a Question Retrieval (QR) model for Answer Retrieval (AR). The first framework ranks answers according to the combined scores of QR and AR models and the second framework ranks answers using the scores of a QR model and best answer flags. In our experiments, we applied the combination of our proposed frameworks and a classical fusion technique to AR models with a Japanese cQA data set containing approximately 9.4M question-answer pairs. When best answer flags in the cQA data cannot be utilized, our combination of AR and QR scores with data fusion outperforms a base AR model on average. When best answer flags can be utilized, the retrieval performance can be improved further. While our results lack statistical significance, we discuss effect sizes as well as future sample sizes to attain sufficient statistical power.
KW - Answer Retrieval
KW - Community question answering
KW - Data fusion
KW - Question Retrieval
KW - Unsupervised model
UR - http://www.scopus.com/inward/record.url?scp=85082400203&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082400203&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-42835-8_2
DO - 10.1007/978-3-030-42835-8_2
M3 - Conference contribution
AN - SCOPUS:85082400203
SN - 9783030428341
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 10
EP - 21
BT - Information Retrieval Technology - 15th Asia Information Retrieval Societies Conference, AIRS 2019, Proceedings
A2 - Wang, Fu Lee
A2 - Xie, Haoran
A2 - Lam, Wai
A2 - Sun, Aixin
A2 - Ku, Lun-Wei
A2 - Hao, Tianyong
A2 - Chen, Wei
A2 - Wong, Tak-Lam
A2 - Tao, Xiaohui
PB - Springer
T2 - 15th Asia Information Retrieval Societies Conference, AIRS 2019
Y2 - 7 November 2019 through 9 November 2019
ER -