Abstract
This paper addresses automatic soft missing-feature mask (MFM) generation based on a leak energy estimation for a simultaneous speech recognition system. An MFM is used as a weight for probability calculation in a recognition process. In a previous work, a threshold-base-zero-or-one function was applied to decide if spectral parameter can be reliable or not for each frequency bin. The function is extended into a weighted sigmoid function which has two free parameters. In addition, a contribution ratio of static features is introduced for the probability calculation in a recognition process which static and dynamic features are input. The ratio can be implemented as a part of soft mask. The average recognition rate based on a soft MFM improved by about 5% for all directions from a conventional system based on a hard MFM. Word recognition rates improved from 70 to 80% for peripheral talkers and from 93 to 97% for front speech when speakers were 90 degrees apart.
Original language | English |
---|---|
Pages (from-to) | 992-995 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2008 Dec 1 |
Externally published | Yes |
Event | INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia Duration: 2008 Sept 22 → 2008 Sept 26 |
Keywords
- Missing feature theory
- Robot audition
- Simultaneous speech recognition
- Soft mask
- Speech recognition
ASJC Scopus subject areas
- Human-Computer Interaction
- Signal Processing
- Software
- Sensory Systems