JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research

Katunobu Itou*, Mikio Yamamoto, Kazuya Takeda, Toshiyuki Takezawa, Tatsuo Matsuoka, Tetsunori Kobayashi, Kiyohiro Shikano, Shuichi Itahashi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

239 Citations (Scopus)

Abstract

In this paper we present the first public Japanese speech corpus for large vocabulary continuous speech recognition (LVCSR) technology, which we have titled JNAS (Japanese Newspaper Article Sentences). We designed it to be comparable to the corpora used in the American and European LVCSR projects. The corpus contains speech recordings (60 h) and their orthographic transcriptions for 306 speakers (153 males and 153 females) reading excerpts from the newspaper's articles and phonetically balanced (PB) sentences. This corpus contains utterances of about 45,000 sentences as a whole with each speaker reading about 150 sentences. JNAS is being distributed on 16 CD-ROMs.

Original languageEnglish
Pages (from-to)199-206
Number of pages8
JournalJournal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi)
Volume20
Issue number3
DOIs
Publication statusPublished - 1999
Externally publishedYes

ASJC Scopus subject areas

  • Acoustics and Ultrasonics

Fingerprint

Dive into the research topics of 'JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research'. Together they form a unique fingerprint.

Cite this