Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments

Peter Jancovic, M Kokuer

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)

Abstract

In this paper, we propose a model for the incorporation of voicing information into a speech recognition system in noisy environments. The employed voicing information is estimated by a novel method that can provide this information for each filter-bank channel and does not require information about the fundamental frequency. The voicing information is modelled by employing the Bernoulli distribution. The voicing model is obtained for each HMM state and mixture by a Viterbi-style training procedure. The proposed voicing incorporation is evaluated both within a standard model and two other models that had compensated for the noise effect, the missing-feature and the multi-conditional training model. Experiments are first performed on noisy speech data from the Aurora 2 database. Significant performance improvements are achieved when the voicing information is incorporated within the standard model as well as the noise-compensated models. The employment of voicing information is also demonstrated on a phoneme recognition task on the noise-corrupted TIMIT database and considerable improvements are observed. (C) 2009 Elsevier B.V. All rights reserved.
Original languageEnglish
Pages (from-to)438-451
Number of pages14
JournalSpeech Communication
Volume51
Issue number5
DOIs
Publication statusPublished - May 2009

Keywords

  • Automatic speech recognition
  • Voicing estimation
  • Phoneme recognition
  • Missing-feature
  • Voicing modelling
  • HMM
  • Noise robustness
  • Aurora 2
  • Source-filter model

Fingerprint

Dive into the research topics of 'Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments'. Together they form a unique fingerprint.

Cite this