Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Authors

Colleges, School and Institutes

External organisations

  • School of Electronic, Electrical and Computer Engineering, University of Birmingham

Abstract

Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person's visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a 'Selective Gaze-contingent ASR' (SGC-ASR). In modelling the relationship between gaze and speech conditioned on noise level - a 'gaze-Lombard effect'-simultaneous dynamic adaptation of acoustic models and the language model is achieved. Evaluation on a matched set of gaze and speech data recorded under a varying speech babble noise condition yields WER performance improvements. The work highlights the use of gaze information in dynamic model-based adaptation methods for noise robust ASR. © 2014 IEEE.

Details

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication statusPublished - May 2014
EventICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Florence, Italy, United Kingdom
Duration: 4 May 20149 May 2014

Conference

ConferenceICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
CountryUnited Kingdom
Period4/05/149/05/14

Keywords

  • Acoustic Model adaptation, acoustic noise, ASR, gaze, Language Model adaptation, Mutual information, noise robust ASR. eye movement, speech, visual attention