Abstract
Recent advances in automatic speech recognition have used large corpora and powerful computational resources to train complex statistical models from high-dimensional features, to attempt to capture all the variability found in natural speech. Such models are difficult to interpret and may be fragile, and contradict or ignore knowledge of human speech production and perception. We report progress towards phoneme recognition using a model of speech which employs very few parameters and which is more faithful to the dynamics and model of human speech production. Using features generated from a neural network bottleneck layer, we obtain recognition accuracy on TIMIT which compares favourably with traditional models of similar power. We discuss the implications of these results for recognition using natural features such as vocal tract resonances and spectral energies
Original language | English |
---|---|
Title of host publication | 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): Proceedings |
Publisher | IEEE Xplore |
Number of pages | 5 |
ISBN (Electronic) | 2379-190X |
DOIs | |
Publication status | E-pub ahead of print - 19 May 2016 |
Event | IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) 2016 - China, Shanghai, China Duration: 20 Mar 2016 → 25 Mar 2016 http://www.icassp2016.org |
Conference
Conference | IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) 2016 |
---|---|
Country/Territory | China |
City | Shanghai |
Period | 20/03/16 → 25/03/16 |
Internet address |
Keywords
- Hidden Markov models
- speech
- training
- video recording
- speech recognition
- computational modeling
- data models