Abstract
This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker’s accent and select an accent-dependent acoustic model? Three accent-based model selection methods are investigated: using the ‘true’ accent model, and unsupervised model selection using i-Vector and phonotactic-based AID. All three methods outperform the unadapted baseline. Most significantly, AID-based model selection using 43s of speech performs better than unsupervised speaker adaptation, even if the latter uses
five times more adaptation data. Combining unsupervised AIDbased model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.
five times more adaptation data. Combining unsupervised AIDbased model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.
Original language | English |
---|---|
Title of host publication | INTERSPEECH 2014 |
Publisher | ISCA |
Pages | 2967-2971 |
Number of pages | 5 |
Publication status | Published - 2014 |
Event | Interspeech 2014 - Singapore, Singapore Duration: 14 Sept 2014 → 18 Sept 2014 |
Conference
Conference | Interspeech 2014 |
---|---|
Country/Territory | Singapore |
City | Singapore |
Period | 14/09/14 → 18/09/14 |
Keywords
- speech recognition
- Model selection
- accent recognition
- i-vector