Abstract
Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person's visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a 'Selective Gaze-contingent ASR' (SGC-ASR). In modelling the relationship between gaze and speech conditioned on noise level - a 'gaze-Lombard effect'-simultaneous dynamic adaptation of acoustic models and the language model is achieved. Evaluation on a matched set of gaze and speech data recorded under a varying speech babble noise condition yields WER performance improvements. The work highlights the use of gaze information in dynamic model-based adaptation methods for noise robust ASR. © 2014 IEEE.
Original language | English |
---|---|
Title of host publication | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
Publisher | IEEE Press / Wiley |
Pages | 1754-1758 |
Number of pages | 5 |
ISBN (Print) | 9781479928927 |
DOIs | |
Publication status | Published - May 2014 |
Event | 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italy, Florence, Italy Duration: 4 May 2014 → 9 May 2014 |
Conference
Conference | 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 |
---|---|
Country/Territory | Italy |
City | Florence |
Period | 4/05/14 → 9/05/14 |
Keywords
- Acoustic Model adaptation
- acoustic noise
- ASR
- gaze
- Language Model adaptation
- Mutual information
- noise robust ASR. eye movement
- speech
- visual attention
ASJC Scopus subject areas
- Signal Processing
- Software
- Electrical and Electronic Engineering