A Multi-Level Linear/Linear Segmental HMM with a Formant-Based Intermediate Layer

Martin Russell; PJB Jackson

doi:10.1016/j.csl.2004.08.001

A Multi-Level Linear/Linear Segmental HMM with a Formant-Based Intermediate Layer

Martin Russell, PJB Jackson

Electronic, Electrical and Systems Engineering

Research output: Contribution to journal › Article

21 Citations (Scopus)

Abstract

A novel multi-level segmented HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate 'articulatory' representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based 'articulatory' parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters. (C) 2004 Elsevier Ltd. All rights reserved.

Original language	English
Pages (from-to)	205-225
Number of pages	21
Journal	Computer Speech and Language
Volume	19
DOIs	https://doi.org/10.1016/j.csl.2004.08.001
Publication status	Published - 1 Aug 2004

Access to Document

10.1016/j.csl.2004.08.001

Cite this

@article{49ef4bb60cb04144b4ec3818071edac0,

title = "A Multi-Level Linear/Linear Segmental HMM with a Formant-Based Intermediate Layer",

abstract = "A novel multi-level segmented HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate 'articulatory' representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based 'articulatory' parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters. (C) 2004 Elsevier Ltd. All rights reserved.",

author = "Martin Russell and PJB Jackson",

year = "2004",

month = aug,

day = "1",

doi = "10.1016/j.csl.2004.08.001",

language = "English",

volume = "19",

pages = "205--225",

journal = "Computer Speech and Language",

publisher = "Elsevier",

}

TY - JOUR

T1 - A Multi-Level Linear/Linear Segmental HMM with a Formant-Based Intermediate Layer

AU - Russell, Martin

AU - Jackson, PJB

PY - 2004/8/1

Y1 - 2004/8/1

N2 - A novel multi-level segmented HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate 'articulatory' representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based 'articulatory' parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters. (C) 2004 Elsevier Ltd. All rights reserved.

AB - A novel multi-level segmented HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate 'articulatory' representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based 'articulatory' parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters. (C) 2004 Elsevier Ltd. All rights reserved.

UR - http://www.scopus.com/inward/record.url?scp=10844250035&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2004.08.001

DO - 10.1016/j.csl.2004.08.001

M3 - Article

VL - 19

SP - 205

EP - 225

JO - Computer Speech and Language

JF - Computer Speech and Language

ER -

A Multi-Level Linear/Linear Segmental HMM with a Formant-Based Intermediate Layer

Abstract

Access to Document

Fingerprint

Cite this