Answerable or not: Devising a dataset for extending machine reading comprehension

Mao Nakanishi, Tetsunori Kobayashi, Yoshihiko Hayashi

研究成果: Conference contribution

4 被引用数 (Scopus)

抄録

Machine reading comprehension (MRC) has recently attracted attention in the fields of natural language processing and machine learning. One of the problematic presumptions with current MRC technologies is that each question is assumed to be answerable by looking at a given text passage. However, to realize human-like language comprehension ability, a machine should also be able to distinguish not-answerable questions (NAQs) from answerable questions. To develop this functionality, a dataset incorporating hard-to-detect NAQs is vital; however, its manual construction would be expensive. This paper proposes a dataset creation method that alters an existing MRC dataset, the Stanford Question Answering Dataset, and describes the resulting dataset. The value of this dataset is likely to increase if each NAQ in the dataset is properly classified with the difficulty of identifying it as an NAQ. This difficulty level would allow researchers to evaluate a machine’s NAQ detection performance more precisely. Therefore, we propose a method for automatically assigning difficulty level labels, which basically measures the similarity between a question and the target text passage. Our NAQ detection experiments demonstrate that the resulting dataset, having difficulty level annotations, is valid and potentially useful in the development of advanced MRC models.

本文言語English
ホスト出版物のタイトルCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
編集者Emily M. Bender, Leon Derczynski, Pierre Isabelle
出版社Association for Computational Linguistics (ACL)
ページ973-983
ページ数11
ISBN(電子版)9781948087506
出版ステータスPublished - 2018
イベント27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States
継続期間: 2018 8月 202018 8月 26

出版物シリーズ

名前COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings

Conference

Conference27th International Conference on Computational Linguistics, COLING 2018
国/地域United States
CitySanta Fe
Period18/8/2018/8/26

ASJC Scopus subject areas

  • 言語および言語学
  • 計算理論と計算数学
  • 言語学および言語

フィンガープリント

「Answerable or not: Devising a dataset for extending machine reading comprehension」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル