AN EXPLORATION OF HUBERT WITH LARGE NUMBER OF CLUSTER UNITS AND MODEL ASSESSMENT USING Bayesian INFORMATION CRITERION

Takashi Maekaku, Xuankai Chang, Yuya Fujita, Shinji Watanabe

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Self-supervised learning (SSL) has become one of the most important technologies to realize spoken dialogue systems for languages that do not have much audio data and its transcription available. Speech representation models are one of the keys to achieving this, and have been actively studied in recent years. Among them, Hidden-Unit BERT (HuBERT) has shown promising results in automatic speech recognition (ASR) tasks. However, previous studies have investigated with limited iterations and cluster units. We explore HuBERT with larger numbers of clusters and iterations in order to obtain better speech representation. Furthermore, we introduce the Bayesian Information Criterion (BIC) as the performance measure of the model. Experimental results show that our model achieves the best performance in 5 out of 8 scores in the 4 metrics for the Zero Resource Speech 2021 task. It also outperforms the HuBERT BASE model trained with 960-hour LibriSpeech (LS) even though our model is only trained with 100-hour LS. In addition, we report that BIC is useful as a clue for determining the appropriate number of clusters to improve performance on phonetic, lexical, and syntactic metrics. Finally, we show that these findings are also effective for the ASR task.

Original languageEnglish
Title of host publication2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7107-7111
Number of pages5
ISBN (Electronic)9781665405409
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore
Duration: 2022 May 232022 May 27

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2022-May
ISSN (Print)1520-6149

Conference

Conference47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/TerritorySingapore
CityVirtual, Online
Period22/5/2322/5/27

Keywords

  • acoustic unit discovery
  • BIC
  • HuBERT
  • self-supervised learning
  • unit-based language model

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'AN EXPLORATION OF HUBERT WITH LARGE NUMBER OF CLUSTER UNITS AND MODEL ASSESSMENT USING Bayesian INFORMATION CRITERION'. Together they form a unique fingerprint.

Cite this