Automatic accent identification as an analytical tool for accent robust automatic speech recognition

Maryam Najafian; Martin Russell

doi:10.1016/j.specom.2020.05.003

Automatic accent identification as an analytical tool for accent robust automatic speech recognition

Maryam Najafian, Martin Russell

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

274 Downloads (Pure)

Abstract

We present a novel study of relationships between automatic accent identification (AID) and accent-robust automatic speech recognition (ASR), using ivector based AID and deep neural network, hidden Markov Model (DNN-HMM) based ASR. A visualization of the AID i-vector space and a novel analysis of the accent content of the WSJCAM0 corpus are presented. Accents that occur at the periphery of AID space are referred to as “extreme”. We demonstrate a negative correlation, with respect to accent, between AID and ASR accuracy, where extreme accents exhibit the highest AID and lowest ASR performance. These relationships between accents inform a set of ASR experiments in which a generic training set (WSJCAM0) is supplemented with a fixed amount of accented data from the ABI (Accents of the British Isles) corpus. The best performance across all accents, a 32% relative reduction in errors compared with the baseline ASR system, is obtained when the supplementary data comprises extreme accented speech, even though this accent accounts for just 14% of the test data. We conclude that i-vector based AID analysis provides a principled approach to the selection of training material for accent robust ASR. We speculate that this may generalize to other detection technologies and other types of variability, such as Speaker Identification (SI) and speaker variability.

Original language	English
Pages (from-to)	44-55
Number of pages	12
Journal	Speech Communication
Volume	122
Early online date	4 Jun 2020
DOIs	https://doi.org/10.1016/j.specom.2020.05.003
Publication status	Published - Sept 2020

Keywords

Speech recognition
accent identification
British accents
i-vector
Accent identification
I-vector

ASJC Scopus subject areas

Software
Communication
Language and Linguistics
Computer Vision and Pattern Recognition
Computer Science Applications
Modelling and Simulation
Linguistics and Language

Access to Document

10.1016/j.specom.2020.05.003Licence: None: All rights reserved

Najafian_&_Russell_Automatic_accent_identification_Speech_Communication_2020Accepted author manuscript, 695 KBLicence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

https://www.sciencedirect.com/science/article/pii/S0167639317300043Licence: None: All rights reserved

Cite this

@article{3d30edcad914492c945c8915d2592548,

title = "Automatic accent identification as an analytical tool for accent robust automatic speech recognition",

abstract = "We present a novel study of relationships between automatic accent identification (AID) and accent-robust automatic speech recognition (ASR), using ivector based AID and deep neural network, hidden Markov Model (DNN-HMM) based ASR. A visualization of the AID i-vector space and a novel analysis of the accent content of the WSJCAM0 corpus are presented. Accents that occur at the periphery of AID space are referred to as “extreme”. We demonstrate a negative correlation, with respect to accent, between AID and ASR accuracy, where extreme accents exhibit the highest AID and lowest ASR performance. These relationships between accents inform a set of ASR experiments in which a generic training set (WSJCAM0) is supplemented with a fixed amount of accented data from the ABI (Accents of the British Isles) corpus. The best performance across all accents, a 32% relative reduction in errors compared with the baseline ASR system, is obtained when the supplementary data comprises extreme accented speech, even though this accent accounts for just 14% of the test data. We conclude that i-vector based AID analysis provides a principled approach to the selection of training material for accent robust ASR. We speculate that this may generalize to other detection technologies and other types of variability, such as Speaker Identification (SI) and speaker variability.",

keywords = "Speech recognition, accent identification, British accents, i-vector, Accent identification, I-vector",

author = "Maryam Najafian and Martin Russell",

year = "2020",

month = sep,

doi = "10.1016/j.specom.2020.05.003",

language = "English",

volume = "122",

pages = "44--55",

journal = "Speech Communication",

issn = "0167-6393",

publisher = "Elsevier",

}

TY - JOUR

T1 - Automatic accent identification as an analytical tool for accent robust automatic speech recognition

AU - Najafian, Maryam

AU - Russell, Martin

PY - 2020/9

Y1 - 2020/9

N2 - We present a novel study of relationships between automatic accent identification (AID) and accent-robust automatic speech recognition (ASR), using ivector based AID and deep neural network, hidden Markov Model (DNN-HMM) based ASR. A visualization of the AID i-vector space and a novel analysis of the accent content of the WSJCAM0 corpus are presented. Accents that occur at the periphery of AID space are referred to as “extreme”. We demonstrate a negative correlation, with respect to accent, between AID and ASR accuracy, where extreme accents exhibit the highest AID and lowest ASR performance. These relationships between accents inform a set of ASR experiments in which a generic training set (WSJCAM0) is supplemented with a fixed amount of accented data from the ABI (Accents of the British Isles) corpus. The best performance across all accents, a 32% relative reduction in errors compared with the baseline ASR system, is obtained when the supplementary data comprises extreme accented speech, even though this accent accounts for just 14% of the test data. We conclude that i-vector based AID analysis provides a principled approach to the selection of training material for accent robust ASR. We speculate that this may generalize to other detection technologies and other types of variability, such as Speaker Identification (SI) and speaker variability.

AB - We present a novel study of relationships between automatic accent identification (AID) and accent-robust automatic speech recognition (ASR), using ivector based AID and deep neural network, hidden Markov Model (DNN-HMM) based ASR. A visualization of the AID i-vector space and a novel analysis of the accent content of the WSJCAM0 corpus are presented. Accents that occur at the periphery of AID space are referred to as “extreme”. We demonstrate a negative correlation, with respect to accent, between AID and ASR accuracy, where extreme accents exhibit the highest AID and lowest ASR performance. These relationships between accents inform a set of ASR experiments in which a generic training set (WSJCAM0) is supplemented with a fixed amount of accented data from the ABI (Accents of the British Isles) corpus. The best performance across all accents, a 32% relative reduction in errors compared with the baseline ASR system, is obtained when the supplementary data comprises extreme accented speech, even though this accent accounts for just 14% of the test data. We conclude that i-vector based AID analysis provides a principled approach to the selection of training material for accent robust ASR. We speculate that this may generalize to other detection technologies and other types of variability, such as Speaker Identification (SI) and speaker variability.

KW - Speech recognition

KW - accent identification

KW - British accents

KW - i-vector

KW - Accent identification

KW - I-vector

UR - http://www.scopus.com/inward/record.url?scp=85086633212&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2020.05.003

DO - 10.1016/j.specom.2020.05.003

M3 - Article

SN - 0167-6393

VL - 122

SP - 44

EP - 55

JO - Speech Communication

JF - Speech Communication

ER -

Automatic accent identification as an analytical tool for accent robust automatic speech recognition

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this