Bayesian audio alignment based on a unified generative model of music composition and performance

Akira Maezawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno

研究成果: Paper査読

5 被引用数 (Scopus)

抄録

This paper presents a new probabilistic model that can align multiple performances of a particular piece of music. Conventionally, dynamic time warping (DTW) and left-to-right hidden Markov models (HMMs) have often been used for audio-to-audio alignment based on a shallow acoustic similarity between performances. Those methods, however, cannot distinguish latent musical structures common to all performances and temporal dynamics unique to each performance. To solve this problem, our model explicitly represents two state sequences: a top-level sequence that determines the common structure inherent in the music itself and a bottom-level sequence that determines the actual temporal fluctuation of each performance. These two sequences are fused into a hierarchical Bayesian HMM and can be learned at the same time from the given performances. Since the top-level sequence assigns the same state for note combinations that repeatedly appear within a piece of music, we can unveil the latent structure of the piece. Moreover, we can easily compare different performances of the same piece by analyzing the bottom-level sequences. Experimental evaluation showed that our method outperformed the conventional methods.

本文言語English
ページ233-238
ページ数6
出版ステータスPublished - 2014 1月 1
イベント15th International Society for Music Information Retrieval Conference, ISMIR 2014 - Taipei, Taiwan, Province of China
継続期間: 2014 10月 272014 10月 31

Conference

Conference15th International Society for Music Information Retrieval Conference, ISMIR 2014
国/地域Taiwan, Province of China
CityTaipei
Period14/10/2714/10/31

ASJC Scopus subject areas

  • 音楽
  • 情報システム

フィンガープリント

「Bayesian audio alignment based on a unified generative model of music composition and performance」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル