Low-Resource Contextual Topic Identification on Speech

Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)


In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified. We first present a general purpose method for topic ID on spoken segments in low-resource languages, using a cascade of universal acoustic modeling, translation lexicons to English, and English-language topic classification. Next, instead of classifying each segment independently, we demonstrate that exploring the contextual dependencies across sequential segments can provide large improvements. In particular, we propose an attention-based contextual model which is able to leverage the contexts in a selective manner. We test both our contextual and non-contextual models on four LORELEI languages, and on all but one our attention-based contextual model significantly outperforms the context-independent models.

Original languageEnglish
Title of host publication2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages8
ISBN (Electronic)9781538643341
Publication statusPublished - 2018 Jul 2
Externally publishedYes
Event2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Athens, Greece
Duration: 2018 Dec 182018 Dec 21

Publication series

Name2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings


Conference2018 IEEE Spoken Language Technology Workshop, SLT 2018


  • Topic identification
  • attention
  • recurrent neural networks
  • universal acoustic modeling

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Linguistics and Language


Dive into the research topics of 'Low-Resource Contextual Topic Identification on Speech'. Together they form a unique fingerprint.

Cite this