TY - GEN
T1 - Improving crowdsourcing-based annotation of Japanese discourse relations
AU - Kishimoto, Yudai
AU - Sawada, Shinnosuke
AU - Murawaki, Yugo
AU - Kawahara, Daisuke
AU - Kurohashi, Sadao
N1 - Publisher Copyright:
© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Although discourse parsing is an important and fundamental task in natural language processing, few languages have corpora annotated with discourse relations and if any, they are small in size. Creating a new corpus of discourse relations by hand is costly and time-consuming. To cope with this problem, Kawahara et al. (2014) constructed a Japanese corpus with discourse annotations through crowdsourcing. However, they did not evaluate the quality of the annotation. In this paper, we evaluate the quality of the annotation using expert annotations. We find out that crowdsourcing-based annotation still leaves much room for improvement. Based on the error analysis, we propose improvement techniques based on language tests. We re-annotated the corpus with discourse annotations using the improvement techniques, and achieved approximately 3% improvement in F-measure. We will make re-annotated data publicly available.
AB - Although discourse parsing is an important and fundamental task in natural language processing, few languages have corpora annotated with discourse relations and if any, they are small in size. Creating a new corpus of discourse relations by hand is costly and time-consuming. To cope with this problem, Kawahara et al. (2014) constructed a Japanese corpus with discourse annotations through crowdsourcing. However, they did not evaluate the quality of the annotation. In this paper, we evaluate the quality of the annotation using expert annotations. We find out that crowdsourcing-based annotation still leaves much room for improvement. Based on the error analysis, we propose improvement techniques based on language tests. We re-annotated the corpus with discourse annotations using the improvement techniques, and achieved approximately 3% improvement in F-measure. We will make re-annotated data publicly available.
KW - Crowdsourcing
KW - Discourse annotation
UR - http://www.scopus.com/inward/record.url?scp=85059904943&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059904943&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85059904943
T3 - LREC 2018 - 11th International Conference on Language Resources and Evaluation
SP - 4044
EP - 4048
BT - LREC 2018 - 11th International Conference on Language Resources and Evaluation
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Piperidis, Stelios
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Hasida, Koiti
A2 - Mazo, Helene
A2 - Choukri, Khalid
A2 - Goggi, Sara
A2 - Mariani, Joseph
A2 - Moreno, Asuncion
A2 - Calzolari, Nicoletta
A2 - Odijk, Jan
A2 - Tokunaga, Takenobu
PB - European Language Resources Association (ELRA)
T2 - 11th International Conference on Language Resources and Evaluation, LREC 2018
Y2 - 7 May 2018 through 12 May 2018
ER -