TY - GEN
T1 - A sequential model for discourse segmentation
AU - Hernault, Hugo
AU - Bollegala, Danushka
AU - Ishizuka, Mitsuru
PY - 2010
Y1 - 2010
N2 - Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.
AB - Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential model outperforms current state-of-the-art discourse segmenters, with an F-score of 0.94. This performance level is close to the human agreement F-score of 0.98.
UR - http://www.scopus.com/inward/record.url?scp=78650444122&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78650444122&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-12116-6_26
DO - 10.1007/978-3-642-12116-6_26
M3 - Conference contribution
AN - SCOPUS:78650444122
SN - 3642121152
SN - 9783642121159
VL - 6008 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 315
EP - 326
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
T2 - 11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010
Y2 - 21 March 2010 through 27 March 2010
ER -