TY - GEN
T1 - Discovering latent country words
T2 - 25th International Conference on Collaboration Technologies and Social Computing, CRIWG+CollabTech 2019
AU - Cho, Heeryon
AU - Ishida, Toru
N1 - Funding Information:
This research was supported by the National Research Foundation of South Korea (NRF) grant funded by the South Korean government (NRF-2017R1A2B4011015). This research was partially supported by a Grant-in-Aid for Scientific Research (A) (17H00759, 2017?2020) from Japan Society for the Promotion of Science (JSPS).
Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - Knowing what concepts are substantial to each country can be helpful in enhancing emotional communication between two countries. As a concrete example of identifying substantial country concepts, we focus on a task of finding latent country words from cross-cultural texts of two countries. We do this by combining word embedding and tensor decomposition: common words that appear in both countries’ texts are selected; their country specific word embeddings are learned; a three-way tensor consisting of word factor, word embedding factor, and country factor are constructed; and CANDECOMP/PARAFAC decomposition is performed on the three-way tensor while fixing the country factor values of the decomposed result. We tested our method on a motivating example of finding latent country words from J-pop lyrics from Japan and K-pop lyrics from South Korea. We found that J-pop lyrics words feature nature related motifs such as ‘petal’, ‘cloud’, ‘universe’, ‘star’, and ‘sky’, whereas K-pop lyrics words highlight human body related motifs such as ‘style’, ‘shirt’, ‘head’, ‘foot’, and ‘skin’.
AB - Knowing what concepts are substantial to each country can be helpful in enhancing emotional communication between two countries. As a concrete example of identifying substantial country concepts, we focus on a task of finding latent country words from cross-cultural texts of two countries. We do this by combining word embedding and tensor decomposition: common words that appear in both countries’ texts are selected; their country specific word embeddings are learned; a three-way tensor consisting of word factor, word embedding factor, and country factor are constructed; and CANDECOMP/PARAFAC decomposition is performed on the three-way tensor while fixing the country factor values of the decomposed result. We tested our method on a motivating example of finding latent country words from J-pop lyrics from Japan and K-pop lyrics from South Korea. We found that J-pop lyrics words feature nature related motifs such as ‘petal’, ‘cloud’, ‘universe’, ‘star’, and ‘sky’, whereas K-pop lyrics words highlight human body related motifs such as ‘style’, ‘shirt’, ‘head’, ‘foot’, and ‘skin’.
KW - Cross-cultural text analysis
KW - Tensor decomposition
KW - Word embedding
UR - http://www.scopus.com/inward/record.url?scp=85072863236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072863236&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-28011-6_17
DO - 10.1007/978-3-030-28011-6_17
M3 - Conference contribution
AN - SCOPUS:85072863236
SN - 9783030280109
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 232
EP - 241
BT - Collaboration Technologies and Social Computing - 25th International Conference, CRIWG+CollabTech 2019, Proceedings
A2 - Nakanishi, Hideyuki
A2 - Egi, Hironori
A2 - Chounta, Irene-Angelica
A2 - Takada, Hideyuki
A2 - Ichimura, Satoshi
A2 - Hoppe, Ulrich
PB - Springer Verlag
Y2 - 4 September 2019 through 6 September 2019
ER -