TY - GEN
T1 - Generating Short Product Descriptors Based on Very Little Training Data
AU - Xiao, Peng
AU - Lee, Joo Young
AU - Tao, Sijie
AU - Hwang, Young Sook
AU - Sakai, Tetsuya
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - We propose a pipeline model for summarising a short textual product description for inclusion in an online advertisement banner. While a standard approach is to truncate the advertiser’s original product description so that the text will fit the small banner, this simplistic approach often removes crucial information or attractive expressions from the original description. Our objective is to shorten the original description more intelligently, so that users’ click through rate (CTR) will improve. One major difficulty in this task, however, is the lack of large training data: machine learning methods that rely on thousands of pairs of the original and shortened texts would not be practical. Hence, our proposed method first employs a semisupervised sequence tagging method called TagLM to convert the original description into a sequence of entities, and then a BiLSTM entity ranker which determines which entities should be preserved: the main idea is to tackle the data sparsity problem by leveraging sequences of entities rather than sequences of words. In our offline experiments with Korean data from travel and fashion domains, our sequence tagger outperforms an LSTM-CRF baseline, and our entity ranker outperforms LambdaMART and RandomForest baselines. More importantly, in our online A/B testing where the proposed method was compared to the simple truncation approach, the CTR improved by 34.1% in the desktop PC environment.
AB - We propose a pipeline model for summarising a short textual product description for inclusion in an online advertisement banner. While a standard approach is to truncate the advertiser’s original product description so that the text will fit the small banner, this simplistic approach often removes crucial information or attractive expressions from the original description. Our objective is to shorten the original description more intelligently, so that users’ click through rate (CTR) will improve. One major difficulty in this task, however, is the lack of large training data: machine learning methods that rely on thousands of pairs of the original and shortened texts would not be practical. Hence, our proposed method first employs a semisupervised sequence tagging method called TagLM to convert the original description into a sequence of entities, and then a BiLSTM entity ranker which determines which entities should be preserved: the main idea is to tackle the data sparsity problem by leveraging sequences of entities rather than sequences of words. In our offline experiments with Korean data from travel and fashion domains, our sequence tagger outperforms an LSTM-CRF baseline, and our entity ranker outperforms LambdaMART and RandomForest baselines. More importantly, in our online A/B testing where the proposed method was compared to the simple truncation approach, the CTR improved by 34.1% in the desktop PC environment.
KW - Advertisement
KW - Classification
KW - Sequence tagging
KW - Summarisation
UR - http://www.scopus.com/inward/record.url?scp=85082388326&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082388326&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-42835-8_12
DO - 10.1007/978-3-030-42835-8_12
M3 - Conference contribution
AN - SCOPUS:85082388326
SN - 9783030428341
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 133
EP - 144
BT - Information Retrieval Technology - 15th Asia Information Retrieval Societies Conference, AIRS 2019, Proceedings
A2 - Wang, Fu Lee
A2 - Xie, Haoran
A2 - Lam, Wai
A2 - Sun, Aixin
A2 - Ku, Lun-Wei
A2 - Hao, Tianyong
A2 - Chen, Wei
A2 - Wong, Tak-Lam
A2 - Tao, Xiaohui
PB - Springer
T2 - 15th Asia Information Retrieval Societies Conference, AIRS 2019
Y2 - 7 November 2019 through 9 November 2019
ER -