Correlation dimension of natural language in a statistical manifold

研究成果: Article査読

1 被引用数 (Scopus)

抄録

The correlation dimension of natural language is measured by applying the Grassberger-Procaccia algorithm to high-dimensional sequences produced by a large-scale language model. This method, previously studied only in a Euclidean space, is reformulated in a statistical manifold via the Fisher-Rao distance. Language exhibits a multifractal, with global self-similarity and a universal dimension around 6.5, which is smaller than those of simple discrete random sequences and larger than that of a Barabási-Albert process. Long memory is the key to producing self-similarity. Our method is applicable to any probabilistic model of real-world discrete sequences, and we show an application to music data.

本文言語English
論文番号L022028
ジャーナルPhysical Review Research
6
2
DOI
出版ステータスPublished - 2024 4月

ASJC Scopus subject areas

  • 物理学および天文学一般

フィンガープリント

「Correlation dimension of natural language in a statistical manifold」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル