TY - GEN
T1 - Semi-supervised discourse relation classification with structural learning
AU - Hernault, Hugo
AU - Bollegala, Danushka
AU - Ishizuka, Mitsuru
PY - 2011
Y1 - 2011
N2 - The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a time-consuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classification based on Structural Learning. First, we solve a set of auxiliary classification problems using unlabeled data. Second, the learned classifiers are used to extend feature vectors to train a discourse relation classifier. By defining a relevant set of auxiliary classification problems, we show that the proposed method brings improvement of at least 50% in accuracy and F-score on the RST Discourse Treebank and Penn Discourse Treebank, when small training sets of ca. 1000 training instances are employed. This is an attractive perspective for training discourse relation classifiers on domains where little amount of labeled training data is available.
AB - The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a time-consuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classification based on Structural Learning. First, we solve a set of auxiliary classification problems using unlabeled data. Second, the learned classifiers are used to extend feature vectors to train a discourse relation classifier. By defining a relevant set of auxiliary classification problems, we show that the proposed method brings improvement of at least 50% in accuracy and F-score on the RST Discourse Treebank and Penn Discourse Treebank, when small training sets of ca. 1000 training instances are employed. This is an attractive perspective for training discourse relation classifiers on domains where little amount of labeled training data is available.
UR - http://www.scopus.com/inward/record.url?scp=79952275230&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952275230&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-19400-9_27
DO - 10.1007/978-3-642-19400-9_27
M3 - Conference contribution
AN - SCOPUS:79952275230
SN - 9783642193996
VL - 6608 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 340
EP - 352
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
T2 - 12th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2011
Y2 - 20 February 2011 through 26 February 2011
ER -