TY - JOUR
T1 - Speech Emotion Recognition Enhanced Traffic Efficiency Solution for Autonomous Vehicles in a 5G-Enabled Space-Air-Ground Integrated Intelligent Transportation System
AU - Tan, Liang
AU - Yu, Keping
AU - Lin, Long
AU - Cheng, Xiaofan
AU - Srivastava, Gautam
AU - Lin, Jerry Chun Wei
AU - Wei, Wei
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 61373162, in part by the Sichuan Provincial Science and Technology Department Project under Grant 2019YFG0183, in part by the Sichuan Provincial Key Laboratory Project under Grant KJ201402, and in part by the Japan Society for the Promotion of Science (JSPS) Grants-in-Aid for Scientific Research (KAKENHI) under Grant JP18K18044 and Grant JP21K17736.
Publisher Copyright:
© 2000-2011 IEEE.
PY - 2022/3/1
Y1 - 2022/3/1
N2 - Speech emotion recognition (SER) is becoming the main human-computer interaction logic for autonomous vehicles in the next generation of intelligent transportation systems (ITSs). It can improve not only the safety of autonomous vehicles but also the personalized in-vehicle experience. However, current vehicle-mounted SER systems still suffer from two major shortcomings. One is the insufficient service capacity of the vehicle communication network, which is unable to meet the SER needs of autonomous vehicles in next-generation ITSs in terms of the data transmission rate, power consumption, and latency. Second, the accuracy of SER is poor, and it cannot provide sufficient interactivity and personalization between users and vehicles. To address these issues, we propose an SER-enhanced traffic efficiency solution for autonomous vehicles in a 5G-enabled space-air-ground integrated network (SAGIN)-based ITS. First, we convert the vehicle speech information data into spectrograms and input them into an AlexNet network model to obtain the high-level features of the vehicle speech acoustic model. At the same time, we convert the vehicle speech information data into text information and input it into the Bidirectional Encoder Representations from Transformers (BERT) model to obtain the high-level features of the corresponding text model. Finally, these two sets of high-level features are cascaded together to obtain fused features, which are sent to a softmax classifier for emotion matching and classification. Experiments show that the proposed solution can improve not only the SAGIN's service capabilities, resulting in a large capacity, high bandwidth, ultralow latency, and high reliability, but also the accuracy of vehicle SER as well as the performance, practicality, and user experience of the ITS
AB - Speech emotion recognition (SER) is becoming the main human-computer interaction logic for autonomous vehicles in the next generation of intelligent transportation systems (ITSs). It can improve not only the safety of autonomous vehicles but also the personalized in-vehicle experience. However, current vehicle-mounted SER systems still suffer from two major shortcomings. One is the insufficient service capacity of the vehicle communication network, which is unable to meet the SER needs of autonomous vehicles in next-generation ITSs in terms of the data transmission rate, power consumption, and latency. Second, the accuracy of SER is poor, and it cannot provide sufficient interactivity and personalization between users and vehicles. To address these issues, we propose an SER-enhanced traffic efficiency solution for autonomous vehicles in a 5G-enabled space-air-ground integrated network (SAGIN)-based ITS. First, we convert the vehicle speech information data into spectrograms and input them into an AlexNet network model to obtain the high-level features of the vehicle speech acoustic model. At the same time, we convert the vehicle speech information data into text information and input it into the Bidirectional Encoder Representations from Transformers (BERT) model to obtain the high-level features of the corresponding text model. Finally, these two sets of high-level features are cascaded together to obtain fused features, which are sent to a softmax classifier for emotion matching and classification. Experiments show that the proposed solution can improve not only the SAGIN's service capabilities, resulting in a large capacity, high bandwidth, ultralow latency, and high reliability, but also the accuracy of vehicle SER as well as the performance, practicality, and user experience of the ITS
KW - 5G-enabled SAGIN
KW - ITS
KW - Speech emotion recognition
KW - artificial intelligence
KW - autonomous vehicles
UR - http://www.scopus.com/inward/record.url?scp=85118544541&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118544541&partnerID=8YFLogxK
U2 - 10.1109/TITS.2021.3119921
DO - 10.1109/TITS.2021.3119921
M3 - Article
AN - SCOPUS:85118544541
SN - 1524-9050
VL - 23
SP - 2830
EP - 2842
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
IS - 3
ER -