Acoustic model selection using limited data for accent robust speech recognition

Maryam Najafian; Saeid Safavi; Abualsoud Hanani; Martin Russell

Acoustic model selection using limited data for accent robust speech recognition

Maryam Najafian, Saeid Safavi, Abualsoud Hanani, Martin Russell

Electronic, Electrical and Systems Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

8 Citations (Scopus)

216 Downloads (Pure)

Abstract

This paper investigates techniques to compensate for the effects of regional accents of British English on automatic speech recognition (ASR) performance. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation, or to use accent identification (AID) to identify the speaker's accent followed by accent-dependent ASR? Three approaches to accent-dependent modelling are investigated: using the `correct' accent model, choosing a model using supervised (ACCDIST-based) accent identification (AID), and building a model using data from neighbouring speakers in `AID space'. All of the methods outperform the accent-independent model, with relative reductions in ASR error rate of up to 44%. Using on average 43s of speech to identify an appropriate accent-dependent model outperforms using it for supervised speaker-adaptation, by 7%.

Original language	English
Title of host publication	2014 Proceedings of the 22nd European Signal Processing Conference (EUSIPCO)
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
Pages	1786 - 1790
Number of pages	5
ISBN (Print)	978-0-9928626-1-9
Publication status	Published - 2014
Event	22nd European Signal Processing Conference, EUSIPCO 2014 - Lisbon, United Kingdom Duration: 1 Sept 2014 → 5 Sept 2014

Conference

Conference	22nd European Signal Processing Conference, EUSIPCO 2014
Country/Territory	United Kingdom
City	Lisbon
Period	1/09/14 → 5/09/14

Keywords

speech recognition
accent recognition

Access to Document

Najafian_et_al_Acoustic_model_selection_2014_Post-Print
(c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Checked June 2015
Accepted author manuscript, 124 KBLicence: Other (please specify with Rights Statement)

Cite this

@inproceedings{000bdbbaa9b4447f9ccf63e0363d6b52,

title = "Acoustic model selection using limited data for accent robust speech recognition",

abstract = "This paper investigates techniques to compensate for the effects of regional accents of British English on automatic speech recognition (ASR) performance. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation, or to use accent identification (AID) to identify the speaker's accent followed by accent-dependent ASR? Three approaches to accent-dependent modelling are investigated: using the `correct' accent model, choosing a model using supervised (ACCDIST-based) accent identification (AID), and building a model using data from neighbouring speakers in `AID space'. All of the methods outperform the accent-independent model, with relative reductions in ASR error rate of up to 44%. Using on average 43s of speech to identify an appropriate accent-dependent model outperforms using it for supervised speaker-adaptation, by 7%.",

keywords = "speech recognition, accent recognition",

author = "Maryam Najafian and Saeid Safavi and Abualsoud Hanani and Martin Russell",

year = "2014",

language = "English",

isbn = "978-0-9928626-1-9",

pages = "1786 -- 1790",

booktitle = "2014 Proceedings of the 22nd European Signal Processing Conference (EUSIPCO)",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

note = "22nd European Signal Processing Conference, EUSIPCO 2014 ; Conference date: 01-09-2014 Through 05-09-2014",

}

Najafian, M, Safavi, S, Hanani, A & Russell, M 2014, Acoustic model selection using limited data for accent robust speech recognition. in 2014 Proceedings of the 22nd European Signal Processing Conference (EUSIPCO). Institute of Electrical and Electronics Engineers (IEEE), pp. 1786 - 1790, 22nd European Signal Processing Conference, EUSIPCO 2014, Lisbon, United Kingdom, 1/09/14.

Acoustic model selection using limited data for accent robust speech recognition. / Najafian, Maryam; Safavi, Saeid; Hanani, Abualsoud et al.
2014 Proceedings of the 22nd European Signal Processing Conference (EUSIPCO). Institute of Electrical and Electronics Engineers (IEEE), 2014. p. 1786 - 1790.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Acoustic model selection using limited data for accent robust speech recognition

AU - Najafian, Maryam

AU - Safavi, Saeid

AU - Hanani, Abualsoud

AU - Russell, Martin

PY - 2014

Y1 - 2014

N2 - This paper investigates techniques to compensate for the effects of regional accents of British English on automatic speech recognition (ASR) performance. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation, or to use accent identification (AID) to identify the speaker's accent followed by accent-dependent ASR? Three approaches to accent-dependent modelling are investigated: using the `correct' accent model, choosing a model using supervised (ACCDIST-based) accent identification (AID), and building a model using data from neighbouring speakers in `AID space'. All of the methods outperform the accent-independent model, with relative reductions in ASR error rate of up to 44%. Using on average 43s of speech to identify an appropriate accent-dependent model outperforms using it for supervised speaker-adaptation, by 7%.

AB - This paper investigates techniques to compensate for the effects of regional accents of British English on automatic speech recognition (ASR) performance. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation, or to use accent identification (AID) to identify the speaker's accent followed by accent-dependent ASR? Three approaches to accent-dependent modelling are investigated: using the `correct' accent model, choosing a model using supervised (ACCDIST-based) accent identification (AID), and building a model using data from neighbouring speakers in `AID space'. All of the methods outperform the accent-independent model, with relative reductions in ASR error rate of up to 44%. Using on average 43s of speech to identify an appropriate accent-dependent model outperforms using it for supervised speaker-adaptation, by 7%.

KW - speech recognition

KW - accent recognition

M3 - Conference contribution

SN - 978-0-9928626-1-9

SP - 1786

EP - 1790

BT - 2014 Proceedings of the 22nd European Signal Processing Conference (EUSIPCO)

PB - Institute of Electrical and Electronics Engineers (IEEE)

T2 - 22nd European Signal Processing Conference, EUSIPCO 2014

Y2 - 1 September 2014 through 5 September 2014

ER -

Acoustic model selection using limited data for accent robust speech recognition

Abstract

Conference

Keywords

Access to Document

Fingerprint

Cite this