Enhancing Spectrogram for Audio Classification Using Time-Frequency Enhancer

Haoran Xing*, Shiqi Zhang*, Daiki Takeuchi, Daisuke Niizumi, Noboru Harada, Shoji Makino*

*この研究の対応する著者

研究成果: Conference contribution

抄録

It is challenging to deploy Transformer-based audio classification models on common terminal devices in real situations due to their high computational costs, increasing the importance of transferring knowledge from the larger Transformer-based model to the smaller convolutional neural networks (CNN)based model via knowledge distillation (KD). Since an audio spectrogram can be regarded as an image, image-based models with CNN-based structures are used as the aforementioned smaller model for KD in several studies. However, the physical meanings of spectrograms differ from that of images in general. This fact possibly leads to the issue that the image-based model may not effectively extract features from a pure spectrogram. Thus, improving the spectrogram can help these models perform better on audio classification tasks. To implement our hypothesis, we propose a new Time-Frequency Enhancer (TFE), which is designed to learn how to enhance input spectrograms to make them effective for audio classification. In addition, we also propose TFE-ENV2, which extends EfficientNetV2 (ENV2), an image-based backbone model. To verify the effectiveness of the proposed method, we compare the performance between the original ENV2 and the proposed TFE-ENV2. In our experiments, the proposed TFE-ENV2 outperformed the original ENV2 on the ESC-50 and Speech Commands V2 datasets, demonstrating that the proposed TFE enhances spectrograms to assist image-based models in audio classification.

本文言語English
ホスト出版物のタイトル2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
出版社Institute of Electrical and Electronics Engineers Inc.
ページ1155-1160
ページ数6
ISBN(電子版)9798350300673
DOI
出版ステータスPublished - 2023
イベント2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 - Taipei, Taiwan, Province of China
継続期間: 2023 10月 312023 11月 3

出版物シリーズ

名前2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

Conference

Conference2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
国/地域Taiwan, Province of China
CityTaipei
Period23/10/3123/11/3

ASJC Scopus subject areas

  • ハードウェアとアーキテクチャ
  • 信号処理
  • 人工知能
  • コンピュータ サイエンスの応用

フィンガープリント

「Enhancing Spectrogram for Audio Classification Using Time-Frequency Enhancer」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル