抄録
Clean speech data is necessary for spoken language processing, however, there is no public Japanese dialect corpus collected for speech processing. Parallel speech corpora of dialect are also important because real dialect affects each other, however, the existing data only includes noisy speech data of dialects and their translation in common language. In this paper, we collected parallel speech corpora of Japanese dialect, 100 read speeches utterance of 25 dialect speakers and their transcriptions of phoneme. We recorded speeches of 5 common language speakers and 20 dialect speakers from 4 areas, 5 speakers from 1 area, respectively. Each dialect speaker converted the same common language texts to their dialect and read them. Speeches are recorded with closed-talk microphone, using for spoken language processing (recognition, synthesis, pronounce estimation). In the experiments, accuracies of automatic speech recognition (ASR) and Kana Kanji conversion (KKC) system are improved by adapting the system with the data.
本文言語 | English |
---|---|
ホスト出版物のタイトル | Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 |
出版社 | European Language Resources Association (ELRA) |
ページ | 4652-4657 |
ページ数 | 6 |
ISBN(電子版) | 9782951740891 |
出版ステータス | Published - 2016 1月 1 |
イベント | 10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia 継続期間: 2016 5月 23 → 2016 5月 28 |
Other
Other | 10th International Conference on Language Resources and Evaluation, LREC 2016 |
---|---|
国/地域 | Slovenia |
City | Portoroz |
Period | 16/5/23 → 16/5/28 |
ASJC Scopus subject areas
- 言語学および言語
- 図書館情報学
- 言語および言語学
- 教育