A sequential model for discourse segmentation

Hugo Hernault*, Danushka Bollegala, Mitsuru Ishizuka

*この研究の対応する著者

研究成果: Conference contribution

31 被引用数 (Scopus)

抄録

Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.

本文言語English
ホスト出版物のタイトルLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ページ315-326
ページ数12
6008 LNCS
DOI
出版ステータスPublished - 2010
外部発表はい
イベント11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010 - Iasi
継続期間: 2010 3月 212010 3月 27

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
6008 LNCS
ISSN(印刷版)03029743
ISSN(電子版)16113349

Other

Other11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010
CityIasi
Period10/3/2110/3/27

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)
  • 理論的コンピュータサイエンス

フィンガープリント

「A sequential model for discourse segmentation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル