TY - JOUR
T1 - Plan Optimization to Bilingual Dictionary Induction for Low-resource Language Families
AU - Nasution, Arbi Haza
AU - Murakami, Yohei
AU - Ishida, Toru
N1 - Funding Information:
This work was mainly done when the first author was a PhD student in Department of Social Informatics, Kyoto University. This research was partially supported by a Grant-in-Aid for Scientific Research (A) (17H00759, 2017-2020) and a Grant-in-Aid for Young Scientists (A) (17H04706, 2017-2020) from Japan Society for the Promotion of Science (JSPS). This research was partially supported by Universitas Islam Riau (UIR) and Universiti Teknologi PETRONAS (UTP) Joint Research Program. The first author was supported by Indonesia Endownment Fund for Education (LPDP). Authors’ addresses: A. H. Nasution (corresponding author), Universitas Islam Riau, Informatics Engineering, Jl. Kaharuddin Nasution 113, Pekanbaru, Riau, Indonesia; email: arbi@eng.uir.ac.id; Y. Murakami, Ritsumeikan University, Faculty of Information Science and Engineering, 1-1-1 Noji-higashi, Kusatsu, 525-8577, Shiga, Japan; email: yohei@fc.ritsumei.ac.jp; T. Ishida, Waseda University, School of Creative Science and Engineering, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan; email: toru.ishida@aoni.waseda.jp. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2021 Association for Computing Machinery. 2375-4699/2021/03-ART29 $15.00 https://doi.org/10.1145/3448215
Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2021/4
Y1 - 2021/4
N2 - Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for the closely related ones, it has been shown that the constraint-based approach is useful for inducing bilingual lexicons from two bilingual dictionaries via the pivot language. However, if there are no available machine-readable dictionaries as input, we need to consider manual creation by bilingual native speakers. To reach a goal of comprehensively create multiple bilingual dictionaries, even if we already have several existing machine-readable bilingual dictionaries, it is still difficult to determine the execution order of the constraint-based approach to reducing the total cost. Plan optimization is crucial in composing the order of bilingual dictionaries creation with the consideration of the methods and their costs. We formalize the plan optimization for creating bilingual dictionaries by utilizing Markov Decision Process (MDP) with the goal to get a more accurate estimation of the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual lexicon induction. We model a prior beta distribution of bilingual lexicon induction precision with language similarity and polysemy of the topology as and parameters. It is further used to model cost function and state transition probability. We estimated the cost of all investment plans as a baseline for evaluating the proposed MDP-based approach with total cost as an evaluation metric. After utilizing the posterior beta distribution in the first batch of experiments to construct the prior beta distribution in the second batch of experiments, the result shows 61.5% of cost reduction compared to the estimated all investment plans and 39.4% of cost reduction compared to the estimated MDP optimal plan. The MDP-based proposal outperformed the baseline on the total cost.
AB - Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for the closely related ones, it has been shown that the constraint-based approach is useful for inducing bilingual lexicons from two bilingual dictionaries via the pivot language. However, if there are no available machine-readable dictionaries as input, we need to consider manual creation by bilingual native speakers. To reach a goal of comprehensively create multiple bilingual dictionaries, even if we already have several existing machine-readable bilingual dictionaries, it is still difficult to determine the execution order of the constraint-based approach to reducing the total cost. Plan optimization is crucial in composing the order of bilingual dictionaries creation with the consideration of the methods and their costs. We formalize the plan optimization for creating bilingual dictionaries by utilizing Markov Decision Process (MDP) with the goal to get a more accurate estimation of the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual lexicon induction. We model a prior beta distribution of bilingual lexicon induction precision with language similarity and polysemy of the topology as and parameters. It is further used to model cost function and state transition probability. We estimated the cost of all investment plans as a baseline for evaluating the proposed MDP-based approach with total cost as an evaluation metric. After utilizing the posterior beta distribution in the first batch of experiments to construct the prior beta distribution in the second batch of experiments, the result shows 61.5% of cost reduction compared to the estimated all investment plans and 39.4% of cost reduction compared to the estimated MDP optimal plan. The MDP-based proposal outperformed the baseline on the total cost.
KW - Plan optimization
KW - closely related languages
KW - low-resource languages
KW - pivot-based bilingual lexicon induction
UR - http://www.scopus.com/inward/record.url?scp=85105732473&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105732473&partnerID=8YFLogxK
U2 - 10.1145/3448215
DO - 10.1145/3448215
M3 - Article
AN - SCOPUS:85105732473
SN - 2375-4699
VL - 20
JO - ACM Transactions on Asian and Low-Resource Language Information Processing
JF - ACM Transactions on Asian and Low-Resource Language Information Processing
IS - 2
M1 - 29
ER -