TY - GEN
T1 - Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?
AU - Shimizu, Rei
AU - Fujita, Sumio
AU - Sakai, Tetsuya
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/8/23
Y1 - 2022/8/23
N2 - Users who read news summaries on search engine result pages and social media may not access the original news articles. Hence, if the summaries are automatically generated, it is vital that the automatic summaries represent the contents of the original articles accurately and fairly. The present study is concerned with lexical bias in sentences: a sentence is considered lexically biased if it contains expressions that may strongly influence the reader's opinion about a topic either positively or negatively. More specifically, we are interested in whether extractive summarizers can amplify lexical bias, by excessively extracting lexically biased sentences from the original article and thus misrepresent it. To address this question, we first introduce the Bias Independence Principle (BIP), which says that the probability that a sentence is selected by an extractive summarizer should be independent of whether the sentence is lexically biased or not. Based on the BIP, we propose an evaluation measure for extractive summarizers called the Bias Independence Criterion (BIC), which compares the distribution of the sentence scores for lexically biased sentences and that of the sentence scores for non-biased sentences. Moreover, based on the BIC, we define another measure called the Summary Feature Permutation Importance (SFPI) to examine whether a particular feature used by a feature-based extractive summarizer is responsible for amplifying lexical bias. Our experimental results suggest that a)∼Different extractive summarizers can amplify lexical bias to different degrees; b)∼The features useful for extracting informative sentences may also be responsible for amplifying lexical bias; and c)∼as mean ROUGE scores increase (implying higher informativeness), mean BIC scores also tend to increase (implying a higher concentration of lexically biased sentences).
AB - Users who read news summaries on search engine result pages and social media may not access the original news articles. Hence, if the summaries are automatically generated, it is vital that the automatic summaries represent the contents of the original articles accurately and fairly. The present study is concerned with lexical bias in sentences: a sentence is considered lexically biased if it contains expressions that may strongly influence the reader's opinion about a topic either positively or negatively. More specifically, we are interested in whether extractive summarizers can amplify lexical bias, by excessively extracting lexically biased sentences from the original article and thus misrepresent it. To address this question, we first introduce the Bias Independence Principle (BIP), which says that the probability that a sentence is selected by an extractive summarizer should be independent of whether the sentence is lexically biased or not. Based on the BIP, we propose an evaluation measure for extractive summarizers called the Bias Independence Criterion (BIC), which compares the distribution of the sentence scores for lexically biased sentences and that of the sentence scores for non-biased sentences. Moreover, based on the BIC, we define another measure called the Summary Feature Permutation Importance (SFPI) to examine whether a particular feature used by a feature-based extractive summarizer is responsible for amplifying lexical bias. Our experimental results suggest that a)∼Different extractive summarizers can amplify lexical bias to different degrees; b)∼The features useful for extracting informative sentences may also be responsible for amplifying lexical bias; and c)∼as mean ROUGE scores increase (implying higher informativeness), mean BIC scores also tend to increase (implying a higher concentration of lexically biased sentences).
KW - evaluation
KW - evaluation measures
KW - lexical bias
KW - text summarization
UR - http://www.scopus.com/inward/record.url?scp=85138414179&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138414179&partnerID=8YFLogxK
U2 - 10.1145/3539813.3545123
DO - 10.1145/3539813.3545123
M3 - Conference contribution
AN - SCOPUS:85138414179
T3 - ICTIR 2022 - Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval
SP - 133
EP - 137
BT - ICTIR 2022 - Proceedings of the 2022 ACM SIGIR International Conference on the Theory of Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 8th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2022
Y2 - 11 July 2022 through 12 July 2022
ER -