Leak Energy Based Missing Feature Mask Generation for ICA and GSS and Its Evaluation with Simultaneous Speech Recognition

Shun'ichi Yamamoto*, Ryu Takeda, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

Abstract

This paper addresses automatic speech recognition (ASR) for robots integrated with sound source separation (SSS) by using leak noise based missing feature mask generation. The missing feature theory (MFT) is a promising approach to improve noise-robustness of ASR. An issue in MFT-based ASR is automatic generation of the missing feature mask. To improve robot audition, we applied this theory to interface ASR and SSS which extracts a sound source originated from a specific direction by multiple microphones. In a robot audition system, it is a promising approach to use SSS as a pre-processor for ASR to be able to deal with any kind of noises. However, ASR usually assumes clean speech input, while speech extracted by SSS never fails to be distorted. MFT can be applied to cope with distortion in the extracted speech. In this case, we can assume that the noises included in extracted sounds are mainly leakages from other channels. Thus, we introduced leak noise based missing feature mask generation, which can generate a missing feature mask automatically by using information on leak noise obtained from other channels. To assess the effectiveness of the leak noise based missing feature mask generation, we used two methods for SSS: geometric source separation (GSS) and independent component analysis (ICA), and Multiband Julian for MFT based ASR. The two constructed systems, that is, GSS-based and ICA-based robot audition systems, were evaluated through recognition of simultaneous speech uttered by two speakers. As a result, we showed that the proposed leak noise based missing feature mask generation worked well in both systems.

Original languageEnglish
Pages42-47
Number of pages6
Publication statusPublished - 2006
Externally publishedYes
Event2006 ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, SAPA 2006 - Pittsburgh, United States
Duration: 2006 Sept 16 → …

Conference

Conference2006 ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing, SAPA 2006
Country/TerritoryUnited States
CityPittsburgh
Period06/9/16 → …

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint

Dive into the research topics of 'Leak Energy Based Missing Feature Mask Generation for ICA and GSS and Its Evaluation with Simultaneous Speech Recognition'. Together they form a unique fingerprint.

Cite this