Introducing EM-FT for Manipuri-English Neural Machine Translation

Rudali Huidrom, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper introduces pretrained word embeddings for Manipuri, a low-resourced Indian language. The pretrained word embeddings based on fastText is capable of handling the highly agglutinative language Manipuri (mni). We then perform machine translation (MT) experiments using neural network (NN) models. In this paper, we confirm the following observations. Firstly, the reported BLEU score of the Transformer architecture with fastText word embedding model EM-FT performs better than without in all the NMT experiments. Secondly, we observe that adding more training data from a different domain of the test data negatively impacts translation accuracy. The resources reported in this paper are made available in the ELRA catalogue to help the low-resourced languages community with MT/NLP tasks.

Original languageEnglish
Title of host publication6th Workshop on Indian Language Data
Subtitle of host publicationResources and Evaluation, WILDRE 2022 - held in conjunction with the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings
EditorsGirish Nath Jha, Sobha Lalitha Devi, Kalika Bali, Atul Kr. Ojha
PublisherEuropean Language Resources Association (ELRA)
Pages1-6
Number of pages6
ISBN (Electronic)9791095546870
Publication statusPublished - 2022
Event6th Workshop on Indian Language Data: Resources and Evaluation, WILDRE 2022 - Marseille, France
Duration: 2022 Jun 20 → …

Publication series

Name6th Workshop on Indian Language Data: Resources and Evaluation, WILDRE 2022 - held in conjunction with the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings

Conference

Conference6th Workshop on Indian Language Data: Resources and Evaluation, WILDRE 2022
Country/TerritoryFrance
CityMarseille
Period22/6/20 → …

Keywords

  • language technology
  • low resource language
  • neural machine translation

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Library and Information Sciences
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Introducing EM-FT for Manipuri-English Neural Machine Translation'. Together they form a unique fingerprint.

Cite this