Preprocessing for acoustic-to-articulatory inversion using real-time MRI movies of Japanese speech

Research output: Contribution to journalConference articlepeer-review

Abstract

Acoustic-to-articulatory inversion (AAI) estimates the articulatory movements by using acoustic speech signals. The traditional AAI relies on indirect estimation using articulatory models. However, recent advancements have proposed the use of machine learning models to directly output real-time MRI (rtMRI) movies. This study applied the existing model to rtMRI movies of Japanese speech to test its potential for achieving highly accurate estimations using the devised preprocessing methods. Preprocessing involves normalization of face alignment and filtering to remove extraneous regions. For objective evaluation, we measured the complex wavelet structural similarity (CW-SSIM). The results indicate that combining the normalization and filtering processes can produce smooth rtMRI movies that closely resemble the original (average CW-SSIM: LSTM, 0.795; BLSTM, 0.793). Therefore, the effectiveness of the preprocessing was demonstrated.

Original languageEnglish
Pages (from-to)1550-1554
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
Publication statusPublished - 2024
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 2024 Sept 12024 Sept 5

Keywords

  • acoustic-to-articulatory inversion
  • deep learning
  • real-time MRI

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Preprocessing for acoustic-to-articulatory inversion using real-time MRI movies of Japanese speech'. Together they form a unique fingerprint.

Cite this