TY - GEN
T1 - Performance estimation of spontaneous speech recognition using non-reference acoustic features
AU - Guo, Ling
AU - Yamada, Takeshi
AU - Makino, Shoji
N1 - Publisher Copyright:
© 2016 Asia Pacific Signal and Information Processing Association.
PY - 2017/1/17
Y1 - 2017/1/17
N2 - To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating the performance of noisy speech recognition based on spectral distortion measures have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, the methods cannot be applied to spontaneous speech because they require the reference speech to obtain the distortion values. To solve this problem, we propose a novel method for estimating the recognition performance of spontaneous speech with various speaking styles. The main feature is to use non-reference acoustic features that do not require the reference speech. The proposed method extracts non-reference features by openSMILE (open-Source Media Interpretation by Large feature-space Extraction) and then estimates the recognition performance by using SVR (Support Vector Regression). We confirmed the effectiveness of the proposed method by experiments using spontaneous speech data from the OGVC (On-line Gaming Voice Chat) corpus.
AB - To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating the performance of noisy speech recognition based on spectral distortion measures have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, the methods cannot be applied to spontaneous speech because they require the reference speech to obtain the distortion values. To solve this problem, we propose a novel method for estimating the recognition performance of spontaneous speech with various speaking styles. The main feature is to use non-reference acoustic features that do not require the reference speech. The proposed method extracts non-reference features by openSMILE (open-Source Media Interpretation by Large feature-space Extraction) and then estimates the recognition performance by using SVR (Support Vector Regression). We confirmed the effectiveness of the proposed method by experiments using spontaneous speech data from the OGVC (On-line Gaming Voice Chat) corpus.
UR - http://www.scopus.com/inward/record.url?scp=85013743598&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85013743598&partnerID=8YFLogxK
U2 - 10.1109/APSIPA.2016.7820792
DO - 10.1109/APSIPA.2016.7820792
M3 - Conference contribution
AN - SCOPUS:85013743598
T3 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
BT - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
Y2 - 13 December 2016 through 16 December 2016
ER -