Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments

Peter Jancovic; M Kokuer

doi:10.1016/j.specom.2009.01.003

Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments

Peter Jancovic, M Kokuer

Electronic, Electrical and Systems Engineering

Research output: Contribution to journal › Article › peer-review

12 Citations (Scopus)

Abstract

In this paper, we propose a model for the incorporation of voicing information into a speech recognition system in noisy environments. The employed voicing information is estimated by a novel method that can provide this information for each filter-bank channel and does not require information about the fundamental frequency. The voicing information is modelled by employing the Bernoulli distribution. The voicing model is obtained for each HMM state and mixture by a Viterbi-style training procedure. The proposed voicing incorporation is evaluated both within a standard model and two other models that had compensated for the noise effect, the missing-feature and the multi-conditional training model. Experiments are first performed on noisy speech data from the Aurora 2 database. Significant performance improvements are achieved when the voicing information is incorporated within the standard model as well as the noise-compensated models. The employment of voicing information is also demonstrated on a phoneme recognition task on the noise-corrupted TIMIT database and considerable improvements are observed. (C) 2009 Elsevier B.V. All rights reserved.

Original language	English
Pages (from-to)	438-451
Number of pages	14
Journal	Speech Communication
Volume	51
Issue number	5
DOIs	https://doi.org/10.1016/j.specom.2009.01.003
Publication status	Published - May 2009

Keywords

Automatic speech recognition
Voicing estimation
Phoneme recognition
Missing-feature
Voicing modelling
HMM
Noise robustness
Aurora 2
Source-filter model

Access to Document

10.1016/j.specom.2009.01.003

Independent Component Analysis for Speech Signal Enhancement and Representation
Jancovic, P.
Engineering & Physical Science Research Council
7/05/08 → 6/11/10
Project: Research Councils
Feature-Combination for Noise Robust Speech Pattern Processing
Jancovic, P.
Engineering & Physical Science Research Council
2/05/06 → 1/08/08
Project: Research Councils

Cite this

@article{455f86ef335f42b582d67746e49f1084,

title = "Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments",

abstract = "In this paper, we propose a model for the incorporation of voicing information into a speech recognition system in noisy environments. The employed voicing information is estimated by a novel method that can provide this information for each filter-bank channel and does not require information about the fundamental frequency. The voicing information is modelled by employing the Bernoulli distribution. The voicing model is obtained for each HMM state and mixture by a Viterbi-style training procedure. The proposed voicing incorporation is evaluated both within a standard model and two other models that had compensated for the noise effect, the missing-feature and the multi-conditional training model. Experiments are first performed on noisy speech data from the Aurora 2 database. Significant performance improvements are achieved when the voicing information is incorporated within the standard model as well as the noise-compensated models. The employment of voicing information is also demonstrated on a phoneme recognition task on the noise-corrupted TIMIT database and considerable improvements are observed. (C) 2009 Elsevier B.V. All rights reserved.",

keywords = "Automatic speech recognition, Voicing estimation, Phoneme recognition, Missing-feature, Voicing modelling, HMM, Noise robustness, Aurora 2, Source-filter model",

author = "Peter Jancovic and M Kokuer",

year = "2009",

month = may,

doi = "10.1016/j.specom.2009.01.003",

language = "English",

volume = "51",

pages = "438--451",

journal = "Speech Communication",

publisher = "Elsevier",

number = "5",

}

TY - JOUR

T1 - Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments

AU - Jancovic, Peter

AU - Kokuer, M

PY - 2009/5

Y1 - 2009/5

N2 - In this paper, we propose a model for the incorporation of voicing information into a speech recognition system in noisy environments. The employed voicing information is estimated by a novel method that can provide this information for each filter-bank channel and does not require information about the fundamental frequency. The voicing information is modelled by employing the Bernoulli distribution. The voicing model is obtained for each HMM state and mixture by a Viterbi-style training procedure. The proposed voicing incorporation is evaluated both within a standard model and two other models that had compensated for the noise effect, the missing-feature and the multi-conditional training model. Experiments are first performed on noisy speech data from the Aurora 2 database. Significant performance improvements are achieved when the voicing information is incorporated within the standard model as well as the noise-compensated models. The employment of voicing information is also demonstrated on a phoneme recognition task on the noise-corrupted TIMIT database and considerable improvements are observed. (C) 2009 Elsevier B.V. All rights reserved.

AB - In this paper, we propose a model for the incorporation of voicing information into a speech recognition system in noisy environments. The employed voicing information is estimated by a novel method that can provide this information for each filter-bank channel and does not require information about the fundamental frequency. The voicing information is modelled by employing the Bernoulli distribution. The voicing model is obtained for each HMM state and mixture by a Viterbi-style training procedure. The proposed voicing incorporation is evaluated both within a standard model and two other models that had compensated for the noise effect, the missing-feature and the multi-conditional training model. Experiments are first performed on noisy speech data from the Aurora 2 database. Significant performance improvements are achieved when the voicing information is incorporated within the standard model as well as the noise-compensated models. The employment of voicing information is also demonstrated on a phoneme recognition task on the noise-corrupted TIMIT database and considerable improvements are observed. (C) 2009 Elsevier B.V. All rights reserved.

KW - Automatic speech recognition

KW - Voicing estimation

KW - Phoneme recognition

KW - Missing-feature

KW - Voicing modelling

KW - HMM

KW - Noise robustness

KW - Aurora 2

KW - Source-filter model

U2 - 10.1016/j.specom.2009.01.003

DO - 10.1016/j.specom.2009.01.003

M3 - Article

VL - 51

SP - 438

EP - 451

JO - Speech Communication

JF - Speech Communication

IS - 5

ER -

Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments

Abstract

Keywords

Access to Document

Fingerprint

Projects

Independent Component Analysis for Speech Signal Enhancement and Representation

Feature-Combination for Noise Robust Speech Pattern Processing

Cite this