Text-Only Domain Adaptation Based on Intermediate CTC

Hiroaki Sato*, Tomoyasu Komori, Takeshi Mishima, Yoshihiko Kawai, Takahiro Mochizuki, Shoei Sato, Tetsuji Ogawa

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

7 Citations (Scopus)


We propose a domain adaptation method that enables connectionist temporal classification (CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models to adapt to a target domain using unpaired text data. The performance of ASR models deteriorates for words and topics not present in the training data, such as the latest news. Although it is difficult to collect paired speech and text data for such subjects, unpaired text data is relatively easy to obtain. Therefore, a domain adaptation method using unpaired text data is proposed for the E2E ASR model based on the intermediate CTC. This model introduces an adaptation branch to embed acoustic and linguistic information in the same latent space, allowing for domain adaptation using unpaired text data of the target domain. Experimental comparisons for multiple out-of-domain settings demonstrate that the proposed text-only domain adaptation achieves a comparable or better performance than the existing shallow-fusion-based domain adaptation, and further performance improvement is achieved by integration with shallow fusion.

Original languageEnglish
Pages (from-to)2208-2212
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2022
Event23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
Duration: 2022 Sept 182022 Sept 22


  • domain adaptation
  • end-to-end speech recognition
  • non-autoregressive
  • unpaired text

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation


Dive into the research topics of 'Text-Only Domain Adaptation Based on Intermediate CTC'. Together they form a unique fingerprint.

Cite this