Projects per year
Abstract
This paper presents an analysis of a low-dimensional representation of speech for modelling speech dynamics, extracted using bottleneck neural networks. The input to the neural network is a set of spectral feature vectors. We explore the effect of various designs and training of the network, such as varying the size of context in the input layer, size of the bottleneck and other hidden layers, and using input reconstruction or phone posteriors as targets. Experiments are performed on TIMIT. The bottleneck features are employed in a conventional HMM-based phoneme recognition system, with recognition accuracy of 70.6% on the core test achieved using only 9-dimensional features. We also analyse how the bottleneck features fit the assumptions of dynamic models of speech. Specifically, we employ the continuous-state hidden Markov model (CS-HMM), which considers speech as a sequence of dwell and transition regions. We demonstrate that the bottleneck features preserve well the trajectory continuity over time and can provide a suitable representation for CS-HMM.
Original language | English |
---|---|
Pages | 583-587 |
DOIs | |
Publication status | Published - Sept 2015 |
Event | Interspeech 2015 - Dresden, Germany Duration: 6 Sept 2015 → 10 Sept 2015 |
Conference
Conference | Interspeech 2015 |
---|---|
Country/Territory | Germany |
City | Dresden |
Period | 6/09/15 → 10/09/15 |
Fingerprint
Dive into the research topics of 'Analysis of a low-dimensional bottleneck neural network representation of speech for modelling speech dynamics'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Speech Recognition by Synthesis (SRbS)
Russell, M. & Jancovic, P.
1/10/12 → 31/10/16
Project: Other Government Departments