TY - GEN
T1 - Word segmentation for the sequences emitted from a word-valued source
AU - Ishida, Takashi
AU - Matsushima, Toshiyasu
AU - Hirasawa, Shigeichi
PY - 2007
Y1 - 2007
N2 - Word segmentation is the most fundamental and important process for Japanese or Chinese language processing. Because there is no separation between words in these languages, we firstly have to separate the sequence into words. On this problem, it is known that the approach by probabilistic language model is highly efficient, and this is shown practically. On the other hand, recently, a word-valued source has been proposed as a new class of source model for the source coding problem. This model can be supposed to reflect more of the probability structure of natural languages. We may regard Japanese sentence or Chinese sentence as the sequence emitting from a non-prefix-free WVS. In this paper, as the first phase of applying WVS to natural language processing, we formulate a word segmentation problem for the sequence from non-prefix-free WVS. Then, we examine the performance of word segmentation for the models by numerical computations.
AB - Word segmentation is the most fundamental and important process for Japanese or Chinese language processing. Because there is no separation between words in these languages, we firstly have to separate the sequence into words. On this problem, it is known that the approach by probabilistic language model is highly efficient, and this is shown practically. On the other hand, recently, a word-valued source has been proposed as a new class of source model for the source coding problem. This model can be supposed to reflect more of the probability structure of natural languages. We may regard Japanese sentence or Chinese sentence as the sequence emitting from a non-prefix-free WVS. In this paper, as the first phase of applying WVS to natural language processing, we formulate a word segmentation problem for the sequence from non-prefix-free WVS. Then, we examine the performance of word segmentation for the models by numerical computations.
UR - http://www.scopus.com/inward/record.url?scp=38049025202&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38049025202&partnerID=8YFLogxK
U2 - 10.1109/CIT.2007.4385160
DO - 10.1109/CIT.2007.4385160
M3 - Conference contribution
AN - SCOPUS:38049025202
SN - 0769529836
SN - 9780769529837
T3 - CIT 2007: 7th IEEE International Conference on Computer and Information Technology
SP - 662
EP - 667
BT - CIT 2007
T2 - CIT 2007: 7th IEEE International Conference on Computer and Information Technology
Y2 - 16 October 2007 through 19 October 2007
ER -