A Multi-Level Linear/Linear Segmental HMM with a Formant-Based Intermediate Layer

Research output: Contribution to journalArticle

Authors

Colleges, School and Institutes

Abstract

A novel multi-level segmented HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate 'articulatory' representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based 'articulatory' parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters. (C) 2004 Elsevier Ltd. All rights reserved.

Details

Original languageEnglish
Pages (from-to)205-225
Number of pages21
JournalComputer Speech and Language
Volume19
Publication statusPublished - 1 Aug 2004