TY - GEN
T1 - Particle-filter based audio-visual beat-tracking for music robot ensemble with human guitarist
AU - Itohara, Tatsuhiko
AU - Otsuka, Takuma
AU - Mizumoto, Takeshi
AU - Ogata, Tetsuya
AU - Okuno, Hiroshi G.
PY - 2011
Y1 - 2011
N2 - This paper presents an audio-visual beat-tracking method for ensemble robots with a human guitarist. Beat-tracking, or estimation of tempo and beat times of music, is critical to the high quality of musical ensemble performance. Since a human plays the guitar in out-beat in back beat and syncopation, the main problems of beat-tracking of a human's guitar playing are twofold: tempo changes and varying note lengths. Most conventional methods have not addressed human's guitar playing. Therefore, they lack the adaptation of either of the problems. To solve the problems simultaneously, our method uses not only audio but visual features. We extract audio features with Spectro-Temporal Pattern Matching (STPM) and visual features with optical flow, mean shift and Hough transform. Our beat-tracking estimates tempo and beat time using a particle filter; both acoustic feature of guitar sounds and visual features of arm motions are represented as particles. The particle is determined based on prior distribution of audio and visual features, respectively Experimental results confirm that our integrated audio-visual approach is robust against tempo changes and varying note lengths. In addition, they also show that estimation convergence rate depends only a little on the number of particles. The real-time factor is 0.88 when the number of particles is 200, and this shows out method works in real-time.
AB - This paper presents an audio-visual beat-tracking method for ensemble robots with a human guitarist. Beat-tracking, or estimation of tempo and beat times of music, is critical to the high quality of musical ensemble performance. Since a human plays the guitar in out-beat in back beat and syncopation, the main problems of beat-tracking of a human's guitar playing are twofold: tempo changes and varying note lengths. Most conventional methods have not addressed human's guitar playing. Therefore, they lack the adaptation of either of the problems. To solve the problems simultaneously, our method uses not only audio but visual features. We extract audio features with Spectro-Temporal Pattern Matching (STPM) and visual features with optical flow, mean shift and Hough transform. Our beat-tracking estimates tempo and beat time using a particle filter; both acoustic feature of guitar sounds and visual features of arm motions are represented as particles. The particle is determined based on prior distribution of audio and visual features, respectively Experimental results confirm that our integrated audio-visual approach is robust against tempo changes and varying note lengths. In addition, they also show that estimation convergence rate depends only a little on the number of particles. The real-time factor is 0.88 when the number of particles is 200, and this shows out method works in real-time.
UR - http://www.scopus.com/inward/record.url?scp=84455179814&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84455179814&partnerID=8YFLogxK
U2 - 10.1109/IROS.2011.6048380
DO - 10.1109/IROS.2011.6048380
M3 - Conference contribution
AN - SCOPUS:84455179814
SN - 9781612844541
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 118
EP - 124
BT - IROS'11 - 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems
T2 - 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems: Celebrating 50 Years of Robotics, IROS'11
Y2 - 25 September 2011 through 30 September 2011
ER -