TY - GEN
T1 - Automatic singing voice to music video generation via mashup of singing video clips
AU - Hirai, Tatsunori
AU - Ikemiya, Yukara
AU - Yoshii, Kazuyoshi
AU - Nakano, Tomoyasu
AU - Goto, Masataka
AU - Morishima, Shigeo
N1 - Funding Information:
This work was supported by OngaCREST, CREST, JST and partially supported by JSPS Grant-in-Aid for JSPS Fellows.
Publisher Copyright:
© 2015 Tatsunori Hirai et al.
PY - 2015
Y1 - 2015
N2 - This paper presents a system that takes audio signals of any song sung by a singer as the input and automatically generates a music video clip in which the singer appears to be actually singing the song. Although music video clips have gained the popularity in video streaming services, not all existing songs have corresponding video clips. Given a song sung by a singer, our system generates a singing video clip by reusing existing singing video clips featuring the singer. More specifically, the system retrieves short fragments of singing video clips that include singing voices similar to that in target song, and then concatenates these fragments using a technique of dynamic programming (DP). To achieve this, we propose a method to extract singing scenes from music video clips by combining vocal activity detection (VAD) with mouth aperture detection (MAD). The subjective experimental results demonstrate the effectiveness of our system.
AB - This paper presents a system that takes audio signals of any song sung by a singer as the input and automatically generates a music video clip in which the singer appears to be actually singing the song. Although music video clips have gained the popularity in video streaming services, not all existing songs have corresponding video clips. Given a song sung by a singer, our system generates a singing video clip by reusing existing singing video clips featuring the singer. More specifically, the system retrieves short fragments of singing video clips that include singing voices similar to that in target song, and then concatenates these fragments using a technique of dynamic programming (DP). To achieve this, we propose a method to extract singing scenes from music video clips by combining vocal activity detection (VAD) with mouth aperture detection (MAD). The subjective experimental results demonstrate the effectiveness of our system.
UR - http://www.scopus.com/inward/record.url?scp=84988484675&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84988484675&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84988484675
T3 - Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015
SP - 153
EP - 159
BT - Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015
PB - Music Technology Research Group, Department of Computer Science, Maynooth University
T2 - 12th International Conference on Sound and Music Computing, SMC 2015
Y2 - 30 July 2015 through 1 August 2015
ER -