Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
Authors
Colleges, School and Institutes
External organisations
- School of Electronic, Electrical and Computer Engineering, University of Birmingham
Abstract
Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person's visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a 'Selective Gaze-contingent ASR' (SGC-ASR). In modelling the relationship between gaze and speech conditioned on noise level - a 'gaze-Lombard effect'-simultaneous dynamic adaptation of acoustic models and the language model is achieved. Evaluation on a matched set of gaze and speech data recorded under a varying speech babble noise condition yields WER performance improvements. The work highlights the use of gaze information in dynamic model-based adaptation methods for noise robust ASR. © 2014 IEEE.
Details
Original language | English |
---|---|
Title of host publication | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
Publication status | Published - May 2014 |
Event | ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Florence, Italy, United Kingdom Duration: 4 May 2014 → 9 May 2014 |
Conference
Conference | ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
---|---|
Country | United Kingdom |
Period | 4/05/14 → 9/05/14 |
Keywords
- Acoustic Model adaptation, acoustic noise, ASR, gaze, Language Model adaptation, Mutual information, noise robust ASR. eye movement, speech, visual attention