The effect of corpus size on case frame acquisition for discourse analysis

Ryohei Sasano*, Daisuke Kawahara, Sadao Kurohashi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

This paper reports the effect of corpus size on case frame acquisition for discourse analysis in Japanese. For this study, we collected a Japanese corpus consisting of up to 100 billion words, and constructed case frames from corpora of six different sizes. Then, we applied these case frames to syntactic and case structure analysis, and zero anaphora resolution. We obtained better results by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.

Original languageEnglish
Title of host publicationNAACL HLT 2009 - Human Language Technologies
Subtitle of host publicationThe 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages521-529
Number of pages9
ISBN (Print)9781932432411
DOIs
Publication statusPublished - 2009
Externally publishedYes
EventHuman Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2009 - Boulder, CO, United States
Duration: 2009 May 312009 Jun 5

Publication series

NameNAACL HLT 2009 - Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Conference

Conference

ConferenceHuman Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2009
Country/TerritoryUnited States
CityBoulder, CO
Period09/5/3109/6/5

ASJC Scopus subject areas

  • Language and Linguistics
  • Social Sciences (miscellaneous)

Fingerprint

Dive into the research topics of 'The effect of corpus size on case frame acquisition for discourse analysis'. Together they form a unique fingerprint.

Cite this