Bottleneck-Minimal Indexing for Generative Document Retrieval

Xin Du*, Lixin Xiu, Kumiko Tanaka-Ishii*

*この研究の対応する著者

研究成果: Conference article査読

抄録

We apply an information-theoretic perspective to reconsider generative document retrieval (GDR), in which a document x ∈ X is indexed by t ∈ T, and a neural autoregressive model is trained to map queries Q to T. GDR can be considered to involve information transmission from documents X to queries Q, with the requirement to transmit more bits via the indexes T. By applying Shannon's rate-distortion theory, the optimality of indexing can be analyzed in terms of the mutual information, and the design of the indexes T can then be regarded as a bottleneck in GDR. After reformulating GDR from this perspective, we empirically quantify the bottleneck underlying GDR. Finally, using the NQ320K and MARCO datasets, we evaluate our proposed bottleneck-minimal indexing method in comparison with various previous indexing methods, and we show that it outperforms those methods.

本文言語English
ページ(範囲)11888-11904
ページ数17
ジャーナルProceedings of Machine Learning Research
235
出版ステータスPublished - 2024
イベント41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
継続期間: 2024 7月 212024 7月 27

ASJC Scopus subject areas

  • 人工知能
  • ソフトウェア
  • 制御およびシステム工学
  • 統計学および確率

フィンガープリント

「Bottleneck-Minimal Indexing for Generative Document Retrieval」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル