Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings

Neil Cooke; Ao Shen; Martin Russell

doi:10.1109/ICASSP.2014.6853899

Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings

Neil Cooke, Ao Shen, Martin Russell

Electronic, Electrical and Systems Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

3 Citations (Scopus)

Abstract

Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person's visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a 'Selective Gaze-contingent ASR' (SGC-ASR). In modelling the relationship between gaze and speech conditioned on noise level - a 'gaze-Lombard effect'-simultaneous dynamic adaptation of acoustic models and the language model is achieved. Evaluation on a matched set of gaze and speech data recorded under a varying speech babble noise condition yields WER performance improvements. The work highlights the use of gaze information in dynamic model-based adaptation methods for noise robust ASR. © 2014 IEEE.

Original language	English
Title of host publication	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publisher	IEEE Press / Wiley
Pages	1754-1758
Number of pages	5
ISBN (Print)	9781479928927
DOIs	https://doi.org/10.1109/ICASSP.2014.6853899
Publication status	Published - May 2014
Event	2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italy, Florence, Italy Duration: 4 May 2014 → 9 May 2014

Conference

Conference	2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Country/Territory	Italy
City	Florence
Period	4/05/14 → 9/05/14

Keywords

Acoustic Model adaptation
acoustic noise
ASR
gaze
Language Model adaptation
Mutual information
noise robust ASR. eye movement
speech
visual attention

ASJC Scopus subject areas

Signal Processing
Software
Electrical and Electronic Engineering

Access to Document

10.1109/ICASSP.2014.6853899

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6853899

Cite this

@inproceedings{53cb6dcec869459c9df52f2b00e9c555,

title = "Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings",

abstract = "Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person's visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a 'Selective Gaze-contingent ASR' (SGC-ASR). In modelling the relationship between gaze and speech conditioned on noise level - a 'gaze-Lombard effect'-simultaneous dynamic adaptation of acoustic models and the language model is achieved. Evaluation on a matched set of gaze and speech data recorded under a varying speech babble noise condition yields WER performance improvements. The work highlights the use of gaze information in dynamic model-based adaptation methods for noise robust ASR. {\textcopyright} 2014 IEEE.",

keywords = "Acoustic Model adaptation, acoustic noise, ASR, gaze, Language Model adaptation, Mutual information, noise robust ASR. eye movement, speech, visual attention",

author = "Neil Cooke and Ao Shen and Martin Russell",

year = "2014",

month = may,

doi = "10.1109/ICASSP.2014.6853899",

language = "English",

isbn = "9781479928927",

pages = "1754--1758",

booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "IEEE Press / Wiley",

note = "2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 ; Conference date: 04-05-2014 Through 09-05-2014",

}

Cooke, N, Shen, A & Russell, M 2014, Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 6853899, IEEE Press / Wiley, pp. 1754-1758, 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, Italy, 4/05/14. https://doi.org/10.1109/ICASSP.2014.6853899

TY - GEN

T1 - Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings

AU - Cooke, Neil

AU - Shen, Ao

AU - Russell, Martin

PY - 2014/5

Y1 - 2014/5

N2 - Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person's visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a 'Selective Gaze-contingent ASR' (SGC-ASR). In modelling the relationship between gaze and speech conditioned on noise level - a 'gaze-Lombard effect'-simultaneous dynamic adaptation of acoustic models and the language model is achieved. Evaluation on a matched set of gaze and speech data recorded under a varying speech babble noise condition yields WER performance improvements. The work highlights the use of gaze information in dynamic model-based adaptation methods for noise robust ASR. © 2014 IEEE.

AB - Previous use of gaze (eye movement) to improve ASR performance involves shifting language model probability mass towards the subset of the vocabulary whose words are related to a person's visual attention. Motivated to improve Automatic Speech Recognition (ASR) performance in acoustically noisy settings by using information from gaze selectively, we propose a 'Selective Gaze-contingent ASR' (SGC-ASR). In modelling the relationship between gaze and speech conditioned on noise level - a 'gaze-Lombard effect'-simultaneous dynamic adaptation of acoustic models and the language model is achieved. Evaluation on a matched set of gaze and speech data recorded under a varying speech babble noise condition yields WER performance improvements. The work highlights the use of gaze information in dynamic model-based adaptation methods for noise robust ASR. © 2014 IEEE.

KW - Acoustic Model adaptation

KW - acoustic noise

KW - ASR

KW - gaze

KW - Language Model adaptation

KW - Mutual information

KW - noise robust ASR. eye movement

KW - speech

KW - visual attention

U2 - 10.1109/ICASSP.2014.6853899

DO - 10.1109/ICASSP.2014.6853899

M3 - Conference contribution

SN - 9781479928927

SP - 1754

EP - 1758

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

PB - IEEE Press / Wiley

T2 - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014

Y2 - 4 May 2014 through 9 May 2014

ER -

Exploiting a 'gaze-Lombard effect' to improve ASR performance in acoustically noisy settings

Abstract

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this