Unsupervised Model Selection for Recognition of Regional Accented Speech

Maryam Najafian; Andrea DeMarco; Stephen Cox; Martin Russell

Unsupervised Model Selection for Recognition of Regional Accented Speech

Maryam Najafian, Andrea DeMarco, Stephen Cox, Martin Russell

Electronic, Electrical and Systems Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

16 Citations (Scopus)

143 Downloads (Pure)

Abstract

This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker’s accent and select an accent-dependent acoustic model? Three accent-based model selection methods are investigated: using the ‘true’ accent model, and unsupervised model selection using i-Vector and phonotactic-based AID. All three methods outperform the unadapted baseline. Most significantly, AID-based model selection using 43s of speech performs better than unsupervised speaker adaptation, even if the latter uses
five times more adaptation data. Combining unsupervised AIDbased model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.

Original language	English
Title of host publication	INTERSPEECH 2014
Publisher	ISCA
Pages	2967-2971
Number of pages	5
Publication status	Published - 2014
Event	Interspeech 2014 - Singapore, Singapore Duration: 14 Sept 2014 → 18 Sept 2014

Conference

Conference	Interspeech 2014
Country/Territory	Singapore
City	Singapore
Period	14/09/14 → 18/09/14

Keywords

speech recognition
Model selection
accent recognition
i-vector

Access to Document

Najafian_et_al_Unsupervised_Model_Selection_Interspeech_2014
Eligibility for repository checked July 2015
Accepted author manuscript, 205 KBLicence: None: All rights reserved

http://www.isca-speech.org/archive/interspeech_2014/i14_2967.html

Cite this

@inproceedings{5aba59b3a515488588f3ace1d49d8719,

title = "Unsupervised Model Selection for Recognition of Regional Accented Speech",

abstract = "This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker{\textquoteright}s accent and select an accent-dependent acoustic model? Three accent-based model selection methods are investigated: using the {\textquoteleft}true{\textquoteright} accent model, and unsupervised model selection using i-Vector and phonotactic-based AID. All three methods outperform the unadapted baseline. Most significantly, AID-based model selection using 43s of speech performs better than unsupervised speaker adaptation, even if the latter usesfive times more adaptation data. Combining unsupervised AIDbased model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.",

keywords = "speech recognition, Model selection, accent recognition, i-vector",

author = "Maryam Najafian and Andrea DeMarco and Stephen Cox and Martin Russell",

year = "2014",

language = "English",

pages = "2967--2971",

booktitle = "INTERSPEECH 2014",

publisher = "ISCA",

note = "Interspeech 2014 ; Conference date: 14-09-2014 Through 18-09-2014",

}

TY - GEN

T1 - Unsupervised Model Selection for Recognition of Regional Accented Speech

AU - Najafian, Maryam

AU - DeMarco, Andrea

AU - Cox, Stephen

AU - Russell, Martin

PY - 2014

Y1 - 2014

N2 - This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker’s accent and select an accent-dependent acoustic model? Three accent-based model selection methods are investigated: using the ‘true’ accent model, and unsupervised model selection using i-Vector and phonotactic-based AID. All three methods outperform the unadapted baseline. Most significantly, AID-based model selection using 43s of speech performs better than unsupervised speaker adaptation, even if the latter usesfive times more adaptation data. Combining unsupervised AIDbased model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.

AB - This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker’s accent and select an accent-dependent acoustic model? Three accent-based model selection methods are investigated: using the ‘true’ accent model, and unsupervised model selection using i-Vector and phonotactic-based AID. All three methods outperform the unadapted baseline. Most significantly, AID-based model selection using 43s of speech performs better than unsupervised speaker adaptation, even if the latter usesfive times more adaptation data. Combining unsupervised AIDbased model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.

KW - speech recognition

KW - Model selection

KW - accent recognition

KW - i-vector

M3 - Conference contribution

SP - 2967

EP - 2971

BT - INTERSPEECH 2014

PB - ISCA

T2 - Interspeech 2014

Y2 - 14 September 2014 through 18 September 2014

ER -

Unsupervised Model Selection for Recognition of Regional Accented Speech

Abstract

Conference

Keywords

Access to Document

Fingerprint

Cite this