Unsupervised Model Selection for Recognition of Regional Accented Speech

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Authors

Colleges, School and Institutes

External organisations

  • University of East Anglia

Abstract

This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker’s accent and select an accent-dependent acoustic model? Three accent-based model selection methods are investigated: using the ‘true’ accent model, and unsupervised model selection using i-Vector and phonotactic-based AID. All three methods outperform the unadapted baseline. Most significantly, AID-based model selection using 43s of speech performs better than unsupervised speaker adaptation, even if the latter uses
five times more adaptation data. Combining unsupervised AIDbased model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.

Details

Original languageEnglish
Title of host publicationINTERSPEECH 2014
Publication statusPublished - 2014
EventInterspeech 2014 - Singapore, Singapore
Duration: 14 Sep 201418 Sep 2014

Conference

ConferenceInterspeech 2014
CountrySingapore
CitySingapore
Period14/09/1418/09/14

Keywords

  • speech recognition, Model selection, accent recognition, i-vector