Capsule Network Over Pre-Trained Language Model and User Writing Styles for Authorship Attribution on Short Texts

Zeping Huang, Mizuho Iwaihara

研究成果: Conference contribution


Authorship Attribution (AA) is a sub-field of Authorship Analysis and text classification, attributing a text to the correct author among a closed set of potential authors. Since short texts usually contain less information about the author, authorship attribution on short texts is often more challenging than authorship attribution on long texts. Recently, the widespread use of pre-trained language models has greatly improved the accuracy of text classification tasks. In this paper, we propose a model which uses the pre-trained language model BERTweet with capsule networks, to solve the authorship attribution on tweets. BERTweet is the first large-scale domain-specific pre-trained language model for English tweets, which can generate high-quality sentence representations of tweets. We combine BERTweet with capsule networks which are particularly powerful at capturing deep features of sentence representations. Thus, both BERTweet and capsule help us achieve remarkable improvements on AA tasks. We also incorporate user writing styles into our model. We design new architectures of capsule networks which combine multiple capsule layers, for generating representations from tweets and user writing styles, improving prediction accuracy and robustness. Our experimental results show that our BERTweet_Capsule_UWS combination shows the state-of-the-art result on the known tweet AA dataset.

ホスト出版物のタイトルCCRIS 2022 - Conference Proceeding
ホスト出版物のサブタイトル2022 3rd International Conference on Control, Robotics and Intelligent System
出版社Association for Computing Machinery
出版ステータスPublished - 2022 8月 26
イベント3rd International Conference on Control, Robotics and Intelligent System, CCRIS 2022 - Virtual, Online, China
継続期間: 2022 8月 262022 8月 28


名前ACM International Conference Proceeding Series


Conference3rd International Conference on Control, Robotics and Intelligent System, CCRIS 2022
CityVirtual, Online

ASJC Scopus subject areas

  • 人間とコンピュータの相互作用
  • コンピュータ ネットワークおよび通信
  • コンピュータ ビジョンおよびパターン認識
  • ソフトウェア