Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?

Rei Shimizu, Sumio Fujita, Tetsuya Sakai

研究成果: Conference contribution

抄録

Users who read news summaries on search engine result pages and social media may not access the original news articles. Hence, if the summaries are automatically generated, it is vital that the automatic summaries represent the contents of the original articles accurately and fairly. The present study is concerned with lexical bias in sentences: a sentence is considered lexically biased if it contains expressions that may strongly influence the reader's opinion about a topic either positively or negatively. More specifically, we are interested in whether extractive summarizers can amplify lexical bias, by excessively extracting lexically biased sentences from the original article and thus misrepresent it. To address this question, we first introduce the Bias Independence Principle (BIP), which says that the probability that a sentence is selected by an extractive summarizer should be independent of whether the sentence is lexically biased or not. Based on the BIP, we propose an evaluation measure for extractive summarizers called the Bias Independence Criterion (BIC), which compares the distribution of the sentence scores for lexically biased sentences and that of the sentence scores for non-biased sentences. Moreover, based on the BIC, we define another measure called the Summary Feature Permutation Importance (SFPI) to examine whether a particular feature used by a feature-based extractive summarizer is responsible for amplifying lexical bias. Our experimental results suggest that a)∼Different extractive summarizers can amplify lexical bias to different degrees; b)∼The features useful for extracting informative sentences may also be responsible for amplifying lexical bias; and c)∼as mean ROUGE scores increase (implying higher informativeness), mean BIC scores also tend to increase (implying a higher concentration of lexically biased sentences).

本文言語English
ホスト出版物のタイトルICTIR 2022 - Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval
出版社Association for Computing Machinery, Inc
ページ133-137
ページ数5
ISBN(電子版)9781450394123
DOI
出版ステータスPublished - 2022 8月 23
イベント8th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2022 - Virtual, Online, Spain
継続期間: 2022 7月 112022 7月 12

出版物シリーズ

名前ICTIR 2022 - Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval

Conference

Conference8th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2022
国/地域Spain
CityVirtual, Online
Period22/7/1122/7/12

ASJC Scopus subject areas

  • コンピュータ サイエンス(その他)
  • 情報システム

フィンガープリント

「Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル