Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings

Neil Cooke, Ao Shen, Martin Russell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person's visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a 'Selective Gaze-contingent ASR' (SGC-ASR). In modelling the relationship between gaze and speech conditioned on noise level - a 'gaze-Lombard effect'-simultaneous dynamic adaptation of acoustic models and the language model is achieved. Evaluation on a matched set of gaze and speech data recorded under a varying speech babble noise condition yields WER performance improvements. The work highlights the use of gaze information in dynamic model-based adaptation methods for noise robust ASR. © 2014 IEEE.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherIEEE Press / Wiley
Pages1754-1758
Number of pages5
ISBN (Print)9781479928927
DOIs
Publication statusPublished - May 2014
Event2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italy, Florence, Italy
Duration: 4 May 20149 May 2014

Conference

Conference2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Country/TerritoryItaly
CityFlorence
Period4/05/149/05/14

Keywords

  • Acoustic Model adaptation
  • acoustic noise
  • ASR
  • gaze
  • Language Model adaptation
  • Mutual information
  • noise robust ASR. eye movement
  • speech
  • visual attention

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings'. Together they form a unique fingerprint.

Cite this