Abstract
We present a novel study of relationships between automatic accent identification (AID) and accent-robust automatic speech recognition (ASR), using ivector based AID and deep neural network, hidden Markov Model (DNN-HMM) based ASR. A visualization of the AID i-vector space and a novel analysis of the accent content of the WSJCAM0 corpus are presented. Accents that occur at the periphery of AID space are referred to as “extreme”. We demonstrate a negative correlation, with respect to accent, between AID and ASR accuracy, where extreme accents exhibit the highest AID and lowest ASR performance. These relationships between accents inform a set of ASR experiments in which a generic training set (WSJCAM0) is supplemented with a fixed amount of accented data from the ABI (Accents of the British Isles) corpus. The best performance across all accents, a 32% relative reduction in errors compared with the baseline ASR system, is obtained when the supplementary data comprises extreme accented speech, even though this accent accounts for just 14% of the test data. We conclude that i-vector based AID analysis provides a principled approach to the selection of training material for accent robust ASR. We speculate that this may generalize to other detection technologies and other types of variability, such as Speaker Identification (SI) and speaker variability.
Original language | English |
---|---|
Pages (from-to) | 44-55 |
Number of pages | 12 |
Journal | Speech Communication |
Volume | 122 |
Early online date | 4 Jun 2020 |
DOIs | |
Publication status | Published - Sept 2020 |
Keywords
- Speech recognition
- accent identification
- British accents
- i-vector
- Accent identification
- I-vector
ASJC Scopus subject areas
- Software
- Communication
- Language and Linguistics
- Computer Vision and Pattern Recognition
- Computer Science Applications
- Modelling and Simulation
- Linguistics and Language