TY - GEN
T1 - Classifying community QA questions that contain an image
AU - Tamaki, Kenta
AU - Togashi, Riku
AU - Kato, Sosuke
AU - Fujita, Sumio
AU - Maeda, Hideyuki
AU - Sakai, Tetsuya
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - We consider the problem of automatically assigning a category to a given question posted to a Community Question Answering (CQA) site, where the question contains not only text but also an image. For example, CQA users may post a photograph of a dress and ask the community "Is this appropriate for a wedding?" where the appropriate category for this question might be "Manners, Ceremonial occasions." We tackle this problem using Convolutional Neural Networks with a DualNet architecture for combining the image and text representations. Our experiments with real data from Yahoo Chiebukuro and crowdsourced gold-standard categories show that the DualNet approach outperforms a text-only baseline (p = .0000), a sum-and-product baseline (p = .0000), Multimodal Compact Bilinear pooling (p = .0000), and a combination of sum-and-product and MCB (p = .0000), where the p-values are based on a randomised Tukey Honestly Significant Difference test with B = 5000 trials.
AB - We consider the problem of automatically assigning a category to a given question posted to a Community Question Answering (CQA) site, where the question contains not only text but also an image. For example, CQA users may post a photograph of a dress and ask the community "Is this appropriate for a wedding?" where the appropriate category for this question might be "Manners, Ceremonial occasions." We tackle this problem using Convolutional Neural Networks with a DualNet architecture for combining the image and text representations. Our experiments with real data from Yahoo Chiebukuro and crowdsourced gold-standard categories show that the DualNet approach outperforms a text-only baseline (p = .0000), a sum-and-product baseline (p = .0000), Multimodal Compact Bilinear pooling (p = .0000), and a combination of sum-and-product and MCB (p = .0000), where the p-values are based on a randomised Tukey Honestly Significant Difference test with B = 5000 trials.
KW - community question answering
KW - convolutional neural networks
KW - question categorisation
UR - http://www.scopus.com/inward/record.url?scp=85063515836&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063515836&partnerID=8YFLogxK
U2 - 10.1145/3234944.3234948
DO - 10.1145/3234944.3234948
M3 - Conference contribution
AN - SCOPUS:85063515836
T3 - ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval
SP - 219
EP - 222
BT - ICTIR 2018 - Proceedings of the 2018 ACM SIGIR International Conference on the Theory of Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 8th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2018
Y2 - 14 September 2018 through 17 September 2018
ER -