Teacher-student learning for low-latency online speech enhancement using WAVe-U-net

Sotaro Nakaoka, Li Li, Shota Inoue, Shoji Makino

Research output: Contribution to journalConference articlepeer-review

15 Citations (Scopus)


In this paper, we propose a low-latency online extension of wave-U-net for single-channel speech enhancement, which utilizes teacher-student learning to reduce the system latency while keeping the enhancement performance high. Wave-U-net is a recently proposed end-to-end source separation method, which achieved remarkable performance in singing voice separation and speech enhancement tasks. Since the enhancement is performed in the time domain, wave-U-net can efficiently model phase information and address the domain transformation limitation, where the time-frequency domain is normally adopted. In this paper, we apply wave-U-net to face-to-face applications such as hearing aids and in-car communication systems, where a strictly low-latency of less than 10 ms is required. To this end, we investigate online versions of wave-U-net and propose the use of teacher-student learning to prevent the performance degradation caused by the reduction in input segment length such that the system delay in a CPU is less than 10 ms. The experimental results revealed that the proposed model could perform in real-time with low-latency and high performance, achieving a signal-to-distortion ratio improvement of about 8.73 dB.

Original languageEnglish
Pages (from-to)661-665
Number of pages5
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication statusPublished - 2021
Externally publishedYes
Event2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
Duration: 2021 Jun 62021 Jun 11


  • Low-latency
  • Real-time
  • Single-channel speech enhancement
  • Teacher-student learning
  • Wave-U-net

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'Teacher-student learning for low-latency online speech enhancement using WAVe-U-net'. Together they form a unique fingerprint.

Cite this