Consonant Recognition with Continuous-State Hidden Markov Models and Perceptually-Motivated Features

Phil Weber; Colin Champion; Stephen Houghton; Peter Jancovic; Martin Russell

doi:10.21437/Interspeech.2015-418

Consonant Recognition with Continuous-State Hidden Markov Models and Perceptually-Motivated Features

Phil Weber, Colin Champion, Stephen Houghton, Peter Jancovic, Martin Russell

Electronic, Electrical and Systems Engineering

Research output: Contribution to conference (unpublished) › Paper › peer-review

Abstract

Research into human perception of consonants has identified phoneme-specific perceptual cues. It has also been shown that the characteristics of the speech signal most useful for recognition depend on the specific speech sound. Typical ASR features and recognisers however neither vary with the type of sound nor relate directly to perceptual cues. We investigate classification and decoding of non-sonorant consonants using basic perceptually-motivated features — phoneme durations and energy in a few broad spectral bands. Our classification results using simple classifiers suggest that features optimal for human perception also perform best for machine classification. We show how characteristics of the models learned relate to knowledge of human speech perception. Recognition results using a continuous-state HMM (CSHMM) show accuracy similar to a discrete-state HMM with similar assumptions. We conclude by outlining how the CSHMM provides a mechanism to make use of other perceptually-important features by integration with similar models for recognition of voiced sounds.

Original language	English
Pages	1893-1897
DOIs	https://doi.org/10.21437/Interspeech.2015-418
Publication status	Published - Sept 2015
Event	Interspeech 2015 - Dresden, Germany Duration: 6 Sept 2015 → 10 Sept 2015

Conference

Conference	Interspeech 2015
Country/Territory	Germany
City	Dresden
Period	6/09/15 → 10/09/15

Access to Document

10.21437/Interspeech.2015-418Licence: None: All rights reserved

Speech Recognition by Synthesis (SRbS)
Russell, M. & Jancovic, P.
1/10/12 → 31/10/16
Project: Other Government Departments

Cite this

@conference{26fbf85324b247749624e71d5819cc25,

title = "Consonant Recognition with Continuous-State Hidden Markov Models and Perceptually-Motivated Features",

abstract = "Research into human perception of consonants has identified phoneme-specific perceptual cues. It has also been shown that the characteristics of the speech signal most useful for recognition depend on the specific speech sound. Typical ASR features and recognisers however neither vary with the type of sound nor relate directly to perceptual cues. We investigate classification and decoding of non-sonorant consonants using basic perceptually-motivated features — phoneme durations and energy in a few broad spectral bands. Our classification results using simple classifiers suggest that features optimal for human perception also perform best for machine classification. We show how characteristics of the models learned relate to knowledge of human speech perception. Recognition results using a continuous-state HMM (CSHMM) show accuracy similar to a discrete-state HMM with similar assumptions. We conclude by outlining how the CSHMM provides a mechanism to make use of other perceptually-important features by integration with similar models for recognition of voiced sounds.",

author = "Phil Weber and Colin Champion and Stephen Houghton and Peter Jancovic and Martin Russell",

year = "2015",

month = sep,

doi = "10.21437/Interspeech.2015-418",

language = "English",

pages = "1893--1897",

note = "Interspeech 2015 ; Conference date: 06-09-2015 Through 10-09-2015",

}

TY - CONF

T1 - Consonant Recognition with Continuous-State Hidden Markov Models and Perceptually-Motivated Features

AU - Weber, Phil

AU - Champion, Colin

AU - Houghton, Stephen

AU - Jancovic, Peter

AU - Russell, Martin

PY - 2015/9

Y1 - 2015/9

N2 - Research into human perception of consonants has identified phoneme-specific perceptual cues. It has also been shown that the characteristics of the speech signal most useful for recognition depend on the specific speech sound. Typical ASR features and recognisers however neither vary with the type of sound nor relate directly to perceptual cues. We investigate classification and decoding of non-sonorant consonants using basic perceptually-motivated features — phoneme durations and energy in a few broad spectral bands. Our classification results using simple classifiers suggest that features optimal for human perception also perform best for machine classification. We show how characteristics of the models learned relate to knowledge of human speech perception. Recognition results using a continuous-state HMM (CSHMM) show accuracy similar to a discrete-state HMM with similar assumptions. We conclude by outlining how the CSHMM provides a mechanism to make use of other perceptually-important features by integration with similar models for recognition of voiced sounds.

AB - Research into human perception of consonants has identified phoneme-specific perceptual cues. It has also been shown that the characteristics of the speech signal most useful for recognition depend on the specific speech sound. Typical ASR features and recognisers however neither vary with the type of sound nor relate directly to perceptual cues. We investigate classification and decoding of non-sonorant consonants using basic perceptually-motivated features — phoneme durations and energy in a few broad spectral bands. Our classification results using simple classifiers suggest that features optimal for human perception also perform best for machine classification. We show how characteristics of the models learned relate to knowledge of human speech perception. Recognition results using a continuous-state HMM (CSHMM) show accuracy similar to a discrete-state HMM with similar assumptions. We conclude by outlining how the CSHMM provides a mechanism to make use of other perceptually-important features by integration with similar models for recognition of voiced sounds.

U2 - 10.21437/Interspeech.2015-418

DO - 10.21437/Interspeech.2015-418

M3 - Paper

SP - 1893

EP - 1897

T2 - Interspeech 2015

Y2 - 6 September 2015 through 10 September 2015

ER -

Consonant Recognition with Continuous-State Hidden Markov Models and Perceptually-Motivated Features

Abstract

Conference

Access to Document

Fingerprint

Projects

Speech Recognition by Synthesis (SRbS)

Cite this