TY - GEN
T1 - Robot with two ears listens to more than two simultaneous utterances by exploiting harmonic structures
AU - Hirasawa, Yasuharu
AU - Takahashi, Toru
AU - Ogata, Tetsuya
AU - Okuno, Hiroshi G.
PY - 2011
Y1 - 2011
N2 - In real-world situations, people often hear more than two simultaneous sounds. For robots, when the number of sound sources exceeds that of sensors, the situation is called under-determined, and robots with two ears need to deal with this situation. Some studies on under-determined sound source separation use L1-norm minimization methods, but the performance of automatic speech recognition with separated speech signals is poor due to its spectral distortion. In this paper, a two-stage separation method to improve separation quality with low computational cost is presented. The first stage uses a L1-norm minimization method in order to extract the harmonic structures. The second stage exploits reliable harmonic structures to maintain acoustic features. Experiments that simulate three utterances recorded by two microphones in an anechoic chamber show that our method improves speech recognition correctness by about three points and is fast enough for real-time separation.
AB - In real-world situations, people often hear more than two simultaneous sounds. For robots, when the number of sound sources exceeds that of sensors, the situation is called under-determined, and robots with two ears need to deal with this situation. Some studies on under-determined sound source separation use L1-norm minimization methods, but the performance of automatic speech recognition with separated speech signals is poor due to its spectral distortion. In this paper, a two-stage separation method to improve separation quality with low computational cost is presented. The first stage uses a L1-norm minimization method in order to extract the harmonic structures. The second stage exploits reliable harmonic structures to maintain acoustic features. Experiments that simulate three utterances recorded by two microphones in an anechoic chamber show that our method improves speech recognition correctness by about three points and is fast enough for real-time separation.
UR - http://www.scopus.com/inward/record.url?scp=79960496413&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79960496413&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-21822-4_35
DO - 10.1007/978-3-642-21822-4_35
M3 - Conference contribution
AN - SCOPUS:79960496413
SN - 9783642218217
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 348
EP - 358
BT - Modern Approaches in Applied Intelligence - 24th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2011, Proceedings
T2 - 24th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2011
Y2 - 28 June 2011 through 1 July 2011
ER -