TY - GEN
T1 - Automating Idea Unit Segmentation and Alignment for Assessing Reading Comprehension via Summary Protocol Analysis
AU - Gecchele, Marcello
AU - Yamada, Hiroaki
AU - Tokunaga, Takenobu
AU - Sawaki, Yasuyo
AU - Ishizuka, Mika
N1 - Funding Information:
This work was supported by the Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research (B) (No. 20H01292; PI: Yasuyo Sawaki) This work was supported by JST SPRING, Grant Number JPMJSP2106.
Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.
PY - 2022
Y1 - 2022
N2 - In this paper, we approach summary evaluation from an applied linguistics (AL) point of view. We provide computational tools to AL researchers to simplify the process of Idea Unit (IU) segmentation. The IU is a segmentation unit that can identify chunks of information. These chunks can be compared across documents to measure the content overlap between a summary and its source text. We propose a full revision of the annotation guidelines to allow machine implementation. The new guideline also improves the inter-annotator agreement, rising from 0.547 to 0.785 (Cohen's “κ”). We release L2WS 2021, a IU gold standard corpus composed of 40 manually annotated student summaries. We propose IUExtract; i.e. the first automatic segmentation algorithm based on the IU. The algorithm was tested over the L2WS 2021 corpus. Our results are promising, achieving a precision of 0.789 and a recall of 0.844. We tested an existing approach to IU alignment via word embeddings with the state of the art model SBERT. The recorded precision for the top 1 aligned pair of IUs was 0.375. We deemed this result insufficient for effective automatic alignment. We propose “SAT”, an online tool to facilitate the collection of alignment gold standards for future training.
AB - In this paper, we approach summary evaluation from an applied linguistics (AL) point of view. We provide computational tools to AL researchers to simplify the process of Idea Unit (IU) segmentation. The IU is a segmentation unit that can identify chunks of information. These chunks can be compared across documents to measure the content overlap between a summary and its source text. We propose a full revision of the annotation guidelines to allow machine implementation. The new guideline also improves the inter-annotator agreement, rising from 0.547 to 0.785 (Cohen's “κ”). We release L2WS 2021, a IU gold standard corpus composed of 40 manually annotated student summaries. We propose IUExtract; i.e. the first automatic segmentation algorithm based on the IU. The algorithm was tested over the L2WS 2021 corpus. Our results are promising, achieving a precision of 0.789 and a recall of 0.844. We tested an existing approach to IU alignment via word embeddings with the state of the art model SBERT. The recorded precision for the top 1 aligned pair of IUs was 0.375. We deemed this result insufficient for effective automatic alignment. We propose “SAT”, an online tool to facilitate the collection of alignment gold standards for future training.
KW - Alignment
KW - Idea Unit
KW - Segmentation
KW - Student Summary Evaluation
KW - Summary Response Analysis
UR - http://www.scopus.com/inward/record.url?scp=85144336988&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85144336988&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85144336988
T3 - 2022 Language Resources and Evaluation Conference, LREC 2022
SP - 4663
EP - 4673
BT - 2022 Language Resources and Evaluation Conference, LREC 2022
A2 - Calzolari, Nicoletta
A2 - Bechet, Frederic
A2 - Blache, Philippe
A2 - Choukri, Khalid
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Odijk, Jan
A2 - Piperidis, Stelios
PB - European Language Resources Association (ELRA)
T2 - 13th International Conference on Language Resources and Evaluation Conference, LREC 2022
Y2 - 20 June 2022 through 25 June 2022
ER -