Contribution of Improved Character Embedding and Latent Posting Styles to Authorship Attribution of Short Texts

Wenjing Huang, Rui Su, Mizuho Iwaihara*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Text contents generated by social networking platforms tend to be short. The problem of authorship attribution on short texts is to determine the author of a given collection of short posts, which is more challenging than that on long texts. Considering the textual characteristics of sparsity and using informal terms, we propose a method of learning text representations using a mixture of words and character n-grams, as input to the architecture of deep neural networks. In this way we make full use of user mentions and topic mentions in posts. We also focus on the textual implicit characteristics and incorporate ten latent posting styles into the models. Our experimental evaluations on tweets show a significant improvement over baselines. We achieve a best accuracy of 83.6%, which is 7.5% improvement over the state-of-the-art. Further experiments with increasing number of authors also demonstrate the superiority of our models.

Original languageEnglish
Title of host publicationWeb and Big Data - 4th International Joint Conference, APWeb-WAIM 2020, Proceedings
EditorsXin Wang, Rui Zhang, Young-Koo Lee, Le Sun, Yang-Sae Moon
PublisherSpringer Science and Business Media Deutschland GmbH
Pages261-269
Number of pages9
ISBN (Print)9783030602895
DOIs
Publication statusPublished - 2020
Event4th Asia-Pacific Web and Web-Age Information Management, Joint Conference on Web and Big Data, APWeb-WAIM 2020 - Tianjin, China
Duration: 2020 Sept 182020 Sept 20

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12318 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference4th Asia-Pacific Web and Web-Age Information Management, Joint Conference on Web and Big Data, APWeb-WAIM 2020
Country/TerritoryChina
CityTianjin
Period20/9/1820/9/20

Keywords

  • Authorship attribution
  • CNN
  • Character n-grams
  • LSTM
  • Latent posting styles
  • Short texts
  • Social network platforms

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Contribution of Improved Character Embedding and Latent Posting Styles to Authorship Attribution of Short Texts'. Together they form a unique fingerprint.

Cite this