Analysis of a low-dimensional bottleneck neural network representation of speech for modelling speech dynamics

Linxue Bai, Peter Jancovic, Martin Russell, Phil Weber

Research output: Contribution to conference (unpublished)Paperpeer-review

Abstract

This paper presents an analysis of a low-dimensional representation of speech for modelling speech dynamics, extracted using bottleneck neural networks. The input to the neural network is a set of spectral feature vectors. We explore the effect of various designs and training of the network, such as varying the size of context in the input layer, size of the bottleneck and other hidden layers, and using input reconstruction or phone posteriors as targets. Experiments are performed on TIMIT. The bottleneck features are employed in a conventional HMM-based phoneme recognition system, with recognition accuracy of 70.6% on the core test achieved using only 9-dimensional features. We also analyse how the bottleneck features fit the assumptions of dynamic models of speech. Specifically, we employ the continuous-state hidden Markov model (CS-HMM), which considers speech as a sequence of dwell and transition regions. We demonstrate that the bottleneck features preserve well the trajectory continuity over time and can provide a suitable representation for CS-HMM.
Original languageEnglish
Pages583-587
DOIs
Publication statusPublished - Sept 2015
EventInterspeech 2015 - Dresden, Germany
Duration: 6 Sept 201510 Sept 2015

Conference

ConferenceInterspeech 2015
Country/TerritoryGermany
CityDresden
Period6/09/1510/09/15

Fingerprint

Dive into the research topics of 'Analysis of a low-dimensional bottleneck neural network representation of speech for modelling speech dynamics'. Together they form a unique fingerprint.

Cite this