A Multi-Level Linear/Linear Segmental HMM with a Formant-Based Intermediate Layer
Research output: Contribution to journal › Article
Colleges, School and Institutes
A novel multi-level segmented HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate 'articulatory' representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based 'articulatory' parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters. (C) 2004 Elsevier Ltd. All rights reserved.
|Number of pages||21|
|Journal||Computer Speech and Language|
|Publication status||Published - 1 Aug 2004|