Analysis of phone errors in computer recognition of children’s speech

Research output: Contribution to conference (unpublished)Paper

Authors

Colleges, School and Institutes

External organisations

  • Disney Research Pittsburgh (DRP)

Abstract

Automatic speech recognition (ASR) for children’s speech is
more difficult than for adults’ speech. This paper explores two
explanations of this phenomenon, namely (A) that it is due to
predictable phonological effects associated with language acquisition
in children, or (B) that it is due to the general increase
in acoustic variability that has been observed in children’s
speech. Phone recognition experiments are conducted on
hand labelled data for children aged between 5 and 6. A statistical
comparison of the resulting confusion matrix with that for
adult speech (TIMIT) shows significant increases in phone substitution
rates for children, some of which correspond to established
phonological phenomena (type A errors). However these
only account for a small proportion of errors, and those associated
with general acoustic variability (type B) appear to account
for the majority. The study also shows significantly more deletion
errors in ASR for children’s speech. Overall, the results
suggest that attempts to improve ASR accuracy for children’s
speech by accommodating phonological phenomena associated
with language acquisition, for example by changing the pronunciation
dictionary, are unlikely to deliver significant success in
the short term, and that coping with the increased acoustic variability
in children’s speech should be the immediate priority.

Details

Original languageEnglish
Number of pages5
Publication statusPublished - 2015
EventSLaTE 2015 - Leipzig, Germany
Duration: 4 Sep 20155 Sep 2015

Conference

ConferenceSLaTE 2015
CountryGermany
CityLeipzig
Period4/09/155/09/15