Generating Short Product Descriptors Based on Very Little Training Data

Peng Xiao, Joo Young Lee, Sijie Tao, Young Sook Hwang, Tetsuya Sakai*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a pipeline model for summarising a short textual product description for inclusion in an online advertisement banner. While a standard approach is to truncate the advertiser’s original product description so that the text will fit the small banner, this simplistic approach often removes crucial information or attractive expressions from the original description. Our objective is to shorten the original description more intelligently, so that users’ click through rate (CTR) will improve. One major difficulty in this task, however, is the lack of large training data: machine learning methods that rely on thousands of pairs of the original and shortened texts would not be practical. Hence, our proposed method first employs a semisupervised sequence tagging method called TagLM to convert the original description into a sequence of entities, and then a BiLSTM entity ranker which determines which entities should be preserved: the main idea is to tackle the data sparsity problem by leveraging sequences of entities rather than sequences of words. In our offline experiments with Korean data from travel and fashion domains, our sequence tagger outperforms an LSTM-CRF baseline, and our entity ranker outperforms LambdaMART and RandomForest baselines. More importantly, in our online A/B testing where the proposed method was compared to the simple truncation approach, the CTR improved by 34.1% in the desktop PC environment.

Original languageEnglish
Title of host publicationInformation Retrieval Technology - 15th Asia Information Retrieval Societies Conference, AIRS 2019, Proceedings
EditorsFu Lee Wang, Haoran Xie, Wai Lam, Aixin Sun, Lun-Wei Ku, Tianyong Hao, Wei Chen, Tak-Lam Wong, Xiaohui Tao
PublisherSpringer
Pages133-144
Number of pages12
ISBN (Print)9783030428341
DOIs
Publication statusPublished - 2020
Event15th Asia Information Retrieval Societies Conference, AIRS 2019 - Kowloon, Hong Kong
Duration: 2019 Nov 72019 Nov 9

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12004 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th Asia Information Retrieval Societies Conference, AIRS 2019
Country/TerritoryHong Kong
CityKowloon
Period19/11/719/11/9

Keywords

  • Advertisement
  • Classification
  • Sequence tagging
  • Summarisation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Generating Short Product Descriptors Based on Very Little Training Data'. Together they form a unique fingerprint.

Cite this