TY - JOUR
T1 - A Multi-Level Linear/Linear Segmental HMM with a Formant-Based Intermediate Layer
AU - Russell, Martin
AU - Jackson, PJB
PY - 2004/8/1
Y1 - 2004/8/1
N2 - A novel multi-level segmented HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate 'articulatory' representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based 'articulatory' parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters. (C) 2004 Elsevier Ltd. All rights reserved.
AB - A novel multi-level segmented HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate 'articulatory' representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based 'articulatory' parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters. (C) 2004 Elsevier Ltd. All rights reserved.
UR - http://www.scopus.com/inward/record.url?scp=10844250035&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2004.08.001
DO - 10.1016/j.csl.2004.08.001
M3 - Article
VL - 19
SP - 205
EP - 225
JO - Computer Speech and Language
JF - Computer Speech and Language
ER -