Modelling Speech Signals Using Formant Frequencies as an Intermediate Representation

Martin Russell; [No Value] [No Value]; PJB Jackson

doi:10.1049/iet-spr:20060179

Modelling Speech Signals Using Formant Frequencies as an Intermediate Representation

Martin Russell, [No Value] [No Value], PJB Jackson

Electronic, Electrical and Systems Engineering

Research output: Contribution to journal › Article

4 Citations (Scopus)

Abstract

Multiple-level segmental hidden Markov models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation are considered. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or the formant-to-acoustic mapping is sufficiently rich. The way in which M-SHMMs exploit formant-based information is also investigated, using singular value decomposition of the forrnant-to-acoustic mappings and linear discriminant analysis. The analysis shows that if the intermediate layer contains information which is linearly related to the spectral representation, that information is used in preference to explicit formant frequencies, even though the latter are useful for phone discrimination. In summary, although these results confirm the utility of M-SHMMs for automatic speech recognition, they provide empirical evidence of the value of nonlinear formant-to-acoustic mappings.

Original language	English
Pages (from-to)	43-50
Number of pages	8
Journal	IET Signal Processing
Volume	1
Issue number	1
DOIs	https://doi.org/10.1049/iet-spr:20060179
Publication status	Published - 1 Jan 2007

Access to Document

10.1049/iet-spr:20060179

A Unified Model for Speech Recognition and Synthesis
Russell, M.
Engineering & Physical Science Research Council
14/09/05 → 13/03/09
Project: Research Councils

Cite this

@article{f41a5fe61ef54df9a28c60548ec961eb,

title = "Modelling Speech Signals Using Formant Frequencies as an Intermediate Representation",

abstract = "Multiple-level segmental hidden Markov models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation are considered. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or the formant-to-acoustic mapping is sufficiently rich. The way in which M-SHMMs exploit formant-based information is also investigated, using singular value decomposition of the forrnant-to-acoustic mappings and linear discriminant analysis. The analysis shows that if the intermediate layer contains information which is linearly related to the spectral representation, that information is used in preference to explicit formant frequencies, even though the latter are useful for phone discrimination. In summary, although these results confirm the utility of M-SHMMs for automatic speech recognition, they provide empirical evidence of the value of nonlinear formant-to-acoustic mappings.",

author = "Martin Russell and {[No Value]}, {[No Value]} and PJB Jackson",

year = "2007",

month = jan,

day = "1",

doi = "10.1049/iet-spr:20060179",

language = "English",

volume = "1",

pages = "43--50",

journal = "IET Signal Processing",

issn = "1751-9683",

publisher = "Institution of Engineering and Technology",

number = "1",

}

TY - JOUR

T1 - Modelling Speech Signals Using Formant Frequencies as an Intermediate Representation

AU - Russell, Martin

AU - [No Value], [No Value]

AU - Jackson, PJB

PY - 2007/1/1

Y1 - 2007/1/1

N2 - Multiple-level segmental hidden Markov models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation are considered. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or the formant-to-acoustic mapping is sufficiently rich. The way in which M-SHMMs exploit formant-based information is also investigated, using singular value decomposition of the forrnant-to-acoustic mappings and linear discriminant analysis. The analysis shows that if the intermediate layer contains information which is linearly related to the spectral representation, that information is used in preference to explicit formant frequencies, even though the latter are useful for phone discrimination. In summary, although these results confirm the utility of M-SHMMs for automatic speech recognition, they provide empirical evidence of the value of nonlinear formant-to-acoustic mappings.

AB - Multiple-level segmental hidden Markov models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation are considered. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or the formant-to-acoustic mapping is sufficiently rich. The way in which M-SHMMs exploit formant-based information is also investigated, using singular value decomposition of the forrnant-to-acoustic mappings and linear discriminant analysis. The analysis shows that if the intermediate layer contains information which is linearly related to the spectral representation, that information is used in preference to explicit formant frequencies, even though the latter are useful for phone discrimination. In summary, although these results confirm the utility of M-SHMMs for automatic speech recognition, they provide empirical evidence of the value of nonlinear formant-to-acoustic mappings.

U2 - 10.1049/iet-spr:20060179

DO - 10.1049/iet-spr:20060179

M3 - Article

SN - 1751-9683

VL - 1

SP - 43

EP - 50

JO - IET Signal Processing

JF - IET Signal Processing

IS - 1

ER -

Modelling Speech Signals Using Formant Frequencies as an Intermediate Representation

Abstract

Access to Document

Fingerprint

Projects

A Unified Model for Speech Recognition and Synthesis

Cite this