Audiovisual asynchrony detection in human speech

Joost X Maier; Massimiliano Di Luca; Uta Noppeney

doi:10.1037/a0019952

Audiovisual asynchrony detection in human speech

Joost X Maier, Massimiliano Di Luca, Uta Noppeney

Psychology

Research output: Contribution to journal › Article › peer-review

40 Citations (Scopus)

Abstract

Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with similar spectrotemporal structure to investigate to what extent humans have adapted to the specific characteristics of natural audiovisual speech. We manipulated spectrotemporal structure of the auditory signal, stimulus length, and task context. Results indicate that the temporal integration window is narrower and more asymmetric for speech than for nonspeech signals. When perceiving audiovisual speech, subjects tolerate visual leading asynchronies, but are nevertheless very sensitive to auditory leading asynchronies that are less likely to occur in natural speech. Thus, speech perception may be fine-tuned to the natural statistics of audiovisual speech, where facial movements always occur before acoustic speech articulation.

Original language	English
Pages (from-to)	245-56
Number of pages	12
Journal	Journal of Experimental Psychology: Human Perception and Performance
Volume	37
Issue number	1
DOIs	https://doi.org/10.1037/a0019952
Publication status	Published - 2011

Access to Document

10.1037/a0019952

Cite this

@article{a2b0739ba7444822bc74e3383baabc85,

title = "Audiovisual asynchrony detection in human speech",

abstract = "Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with similar spectrotemporal structure to investigate to what extent humans have adapted to the specific characteristics of natural audiovisual speech. We manipulated spectrotemporal structure of the auditory signal, stimulus length, and task context. Results indicate that the temporal integration window is narrower and more asymmetric for speech than for nonspeech signals. When perceiving audiovisual speech, subjects tolerate visual leading asynchronies, but are nevertheless very sensitive to auditory leading asynchronies that are less likely to occur in natural speech. Thus, speech perception may be fine-tuned to the natural statistics of audiovisual speech, where facial movements always occur before acoustic speech articulation.",

author = "Maier, {Joost X} and {Di Luca}, Massimiliano and Uta Noppeney",

year = "2011",

doi = "10.1037/a0019952",

language = "English",

volume = "37",

pages = "245--56",

journal = "Journal of Experimental Psychology: Human Perception and Performance",

issn = "1939-1277",

publisher = "American Psychological Association",

number = "1",

}

TY - JOUR

T1 - Audiovisual asynchrony detection in human speech

AU - Maier, Joost X

AU - Di Luca, Massimiliano

AU - Noppeney, Uta

PY - 2011

Y1 - 2011

N2 - Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with similar spectrotemporal structure to investigate to what extent humans have adapted to the specific characteristics of natural audiovisual speech. We manipulated spectrotemporal structure of the auditory signal, stimulus length, and task context. Results indicate that the temporal integration window is narrower and more asymmetric for speech than for nonspeech signals. When perceiving audiovisual speech, subjects tolerate visual leading asynchronies, but are nevertheless very sensitive to auditory leading asynchronies that are less likely to occur in natural speech. Thus, speech perception may be fine-tuned to the natural statistics of audiovisual speech, where facial movements always occur before acoustic speech articulation.

AB - Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with similar spectrotemporal structure to investigate to what extent humans have adapted to the specific characteristics of natural audiovisual speech. We manipulated spectrotemporal structure of the auditory signal, stimulus length, and task context. Results indicate that the temporal integration window is narrower and more asymmetric for speech than for nonspeech signals. When perceiving audiovisual speech, subjects tolerate visual leading asynchronies, but are nevertheless very sensitive to auditory leading asynchronies that are less likely to occur in natural speech. Thus, speech perception may be fine-tuned to the natural statistics of audiovisual speech, where facial movements always occur before acoustic speech articulation.

U2 - 10.1037/a0019952

DO - 10.1037/a0019952

M3 - Article

C2 - 20731507

SN - 1939-1277

VL - 37

SP - 245

EP - 256

JO - Journal of Experimental Psychology: Human Perception and Performance

JF - Journal of Experimental Psychology: Human Perception and Performance

IS - 1

ER -

Audiovisual asynchrony detection in human speech

Abstract

Access to Document

Fingerprint

Cite this