Representing the Twittersphere: Archiving a representative sample of Twitter data under resource constraints

Airo Hino*, Robert A. Fahey

*この研究の対応する著者

研究成果: Article査読

27 被引用数 (Scopus)

抄録

The rising popularity of social media posts, most notably Twitter posts, as a data source for social science research poses significant problems with regard to access to representative, high-quality data for analysis. Cheap, publicly available data such as that obtained from Twitter's public application programming interfaces is often of low quality, while high-quality data is expensive both financially and computationally. Moreover, data is often available only in real-time, making post-hoc analysis difficult or impossible. We propose and test a methodology for inexpensively creating an archive of Twitter data through population sampling, yielding a database that is highly representative of the targeted user population (in this test case, the entire population of Japanese-language Twitter users). Comparing the tweet volume, keywords, and topics found in our sample data set with the ground truth of Twitter's full data feed confirmed a very high degree of representativeness in the sample. We conclude that this approach yields a data set that is suitable for a wide range of post-hoc analyses, while remaining cost effective and accessible to a wide range of researchers.

本文言語English
ページ(範囲)175-184
ページ数10
ジャーナルInternational Journal of Information Management
48
DOI
出版ステータスPublished - 2019 10月

ASJC Scopus subject areas

  • 情報システム
  • コンピュータ ネットワークおよび通信
  • 図書館情報学

フィンガープリント

「Representing the Twittersphere: Archiving a representative sample of Twitter data under resource constraints」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル