TY - GEN
T1 - Fully-automatic marker-based chunking in 11 European languages and counts of the number of analogies between chunks
AU - Takeya, Kota
AU - Lepage, Yves
PY - 2011
Y1 - 2011
N2 - Analogy has been proposed as a possible principle for example-based machine translation. For such a framework to work properly, the training data should contain a large number of analogies between sentences. Consequently, such a framework can only work properly with short and repetitive sentences. To handle longer and more varied sentences, cutting the sentences into chunks could be a solution if the number of analogies between chunks is confirmed to be large. This paper thus reports counts of number of analogies using different numbers of chunk markers in 11 European languages. These experiments confirm that the number of analogies between chunks is very large: several tens of thousands of analogies between chunks extracted from sentences among which only very few analogies, if not none, were found.
AB - Analogy has been proposed as a possible principle for example-based machine translation. For such a framework to work properly, the training data should contain a large number of analogies between sentences. Consequently, such a framework can only work properly with short and repetitive sentences. To handle longer and more varied sentences, cutting the sentences into chunks could be a solution if the number of analogies between chunks is confirmed to be large. This paper thus reports counts of number of analogies using different numbers of chunk markers in 11 European languages. These experiments confirm that the number of analogies between chunks is very large: several tens of thousands of analogies between chunks extracted from sentences among which only very few analogies, if not none, were found.
KW - Analogy
KW - Branching entropy
KW - Marker hypothesis
KW - Marker-based chunking
UR - http://www.scopus.com/inward/record.url?scp=84863876755&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863876755&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84863876755
SN - 9784905166023
T3 - PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
SP - 567
EP - 576
BT - PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
T2 - 25th Pacific Asia Conference on Language, Information and Computation, PACLIC 25
Y2 - 16 December 2011 through 18 December 2011
ER -