Pitch-timbre disentanglement of musical instrument sounds based on VAE-based metric learning

Keitaro Tanaka, Ryo Nishikimi, Yoshiaki Bando, Kazuyoshi Yoshii, Shigeo Morishima

Research output: Contribution to journalConference articlepeer-review

4 Citations (Scopus)


This paper describes a representation learning method for disentangling an arbitrary musical instrument sound into latent pitch and timbre representations. Although such pitch-timbre disentanglement has been achieved with a variational autoencoder (VAE), especially for a predefined set of musical instruments, the latent pitch and timbre representations are outspread, making them hard to interpret. To mitigate this problem, we introduce a metric learning technique into a VAE with latent pitch and timbre spaces so that similar (different) pitches or timbres are mapped close to (far from) each other. Specifically, our VAE is trained with additional contrastive losses so that the latent distances between two arbitrary sounds of the same pitch or timbre are minimized, and those of different pitches or timbres are maximized. This training is performed under weak supervision that uses only whether the pitches and timbres of two sounds are the same or not, instead of their actual values. This improves the generalization capability for unseen musical instruments. Experimental results show that the proposed method can find better-structured disentangled representations with pitch and timbre clusters even for unseen musical instruments.

Original languageEnglish
Pages (from-to)111-115
Number of pages5
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication statusPublished - 2021
Event2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
Duration: 2021 Jun 62021 Jun 11


  • Disentangled representation learning
  • Metric learning
  • Pitch
  • Timbre modeling
  • Variational autoencoder

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'Pitch-timbre disentanglement of musical instrument sounds based on VAE-based metric learning'. Together they form a unique fingerprint.

Cite this