Capsule Network Over Pre-Trained Language Model and User Writing Styles for Authorship Attribution on Short Texts

Zeping Huang, Mizuho Iwaihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Authorship Attribution (AA) is a sub-field of Authorship Analysis and text classification, attributing a text to the correct author among a closed set of potential authors. Since short texts usually contain less information about the author, authorship attribution on short texts is often more challenging than authorship attribution on long texts. Recently, the widespread use of pre-trained language models has greatly improved the accuracy of text classification tasks. In this paper, we propose a model which uses the pre-trained language model BERTweet with capsule networks, to solve the authorship attribution on tweets. BERTweet is the first large-scale domain-specific pre-trained language model for English tweets, which can generate high-quality sentence representations of tweets. We combine BERTweet with capsule networks which are particularly powerful at capturing deep features of sentence representations. Thus, both BERTweet and capsule help us achieve remarkable improvements on AA tasks. We also incorporate user writing styles into our model. We design new architectures of capsule networks which combine multiple capsule layers, for generating representations from tweets and user writing styles, improving prediction accuracy and robustness. Our experimental results show that our BERTweet_Capsule_UWS combination shows the state-of-the-art result on the known tweet AA dataset.

Original languageEnglish
Title of host publicationCCRIS 2022 - Conference Proceeding
Subtitle of host publication2022 3rd International Conference on Control, Robotics and Intelligent System
PublisherAssociation for Computing Machinery
Pages104-110
Number of pages7
ISBN (Electronic)9781450396851
DOIs
Publication statusPublished - 2022 Aug 26
Event3rd International Conference on Control, Robotics and Intelligent System, CCRIS 2022 - Virtual, Online, China
Duration: 2022 Aug 262022 Aug 28

Publication series

NameACM International Conference Proceeding Series

Conference

Conference3rd International Conference on Control, Robotics and Intelligent System, CCRIS 2022
Country/TerritoryChina
CityVirtual, Online
Period22/8/2622/8/28

Keywords

  • Authorship attribution
  • Capsule network
  • Pre-trained language model
  • Text classification

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this