Word segmentation for the sequences emitted from a word-valued source

Takashi Ishida*, Toshiyasu Matsushima, Shigeichi Hirasawa

*この研究の対応する著者

研究成果: Conference contribution

抄録

Word segmentation is the most fundamental and important process for Japanese or Chinese language processing. Because there is no separation between words in these languages, we firstly have to separate the sequence into words. On this problem, it is known that the approach by probabilistic language model is highly efficient, and this is shown practically. On the other hand, recently, a word-valued source has been proposed as a new class of source model for the source coding problem. This model can be supposed to reflect more of the probability structure of natural languages. We may regard Japanese sentence or Chinese sentence as the sequence emitting from a non-prefix-free WVS. In this paper, as the first phase of applying WVS to natural language processing, we formulate a word segmentation problem for the sequence from non-prefix-free WVS. Then, we examine the performance of word segmentation for the models by numerical computations.

本文言語English
ホスト出版物のタイトルCIT 2007
ホスト出版物のサブタイトル7th IEEE International Conference on Computer and Information Technology
ページ662-667
ページ数6
DOI
出版ステータスPublished - 2007
イベントCIT 2007: 7th IEEE International Conference on Computer and Information Technology - Aizu-Wakamatsu, Fukushima, Japan
継続期間: 2007 10月 162007 10月 19

出版物シリーズ

名前CIT 2007: 7th IEEE International Conference on Computer and Information Technology

Conference

ConferenceCIT 2007: 7th IEEE International Conference on Computer and Information Technology
国/地域Japan
CityAizu-Wakamatsu, Fukushima
Period07/10/1607/10/19

ASJC Scopus subject areas

  • コンピュータ サイエンスの応用
  • 情報システム
  • ソフトウェア
  • 数学 (全般)

フィンガープリント

「Word segmentation for the sequences emitted from a word-valued source」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル