Two-encoder pointer-generator network for summarizing segments of long articles

Junhao Li, Mizuho Iwaihara*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Usually long documents contain many sections and segments. In Wikipedia, one article can usually be divided into sections and one section can be divided into segments. But although one article is already divided into smaller segments, one segment can still be too long to read. So, we consider that segments should have a short summary for readers to grasp a quick view of the segment. This paper discusses applying neural summarization models including Seq2Seq model and pointer generator network model to segment summarization. These models for summarization can take target segments as the only input to the model. However, in our case, it is very likely that the remaining segments in the same article contain descriptions related to the target segment. Therefore, we propose several ways to extract an additional sequence from the whole article and then combine with the target segment, to be supplied as the input for summarization. We compare the results against the original models without additional sequences. Furthermore, we propose a new model that uses two encoders to process the target segment and additional sequence separately. Our results show our two-encoder model outperforms the original models in terms of ROGUE and METEOR scores.

Original languageEnglish
Title of host publicationWeb and Big Data - 3rd International Joint Conference, APWeb-WAIM 2019, Proceedings
EditorsJie Shao, Man Lung Yiu, Masashi Toyoda, Dongxiang Zhang, Wei Wang, Bin Cui
PublisherSpringer Verlag
Pages299-313
Number of pages15
ISBN (Print)9783030260712
DOIs
Publication statusPublished - 2019
Event3rd APWeb and WAIM Joint Conference on Web and Big Data, APWeb-WAIM 2019 - Chengdu, China
Duration: 2019 Aug 12019 Aug 3

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11641 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd APWeb and WAIM Joint Conference on Web and Big Data, APWeb-WAIM 2019
Country/TerritoryChina
CityChengdu
Period19/8/119/8/3

Keywords

  • Deep learning
  • Multi-encoder
  • Pointer generator network
  • Seq2Seq
  • Text summarization

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Two-encoder pointer-generator network for summarizing segments of long articles'. Together they form a unique fingerprint.

Cite this