Analysis of phone errors in computer recognition of children’s speech

Evangelia Fringi, Jill Lehman, Martin Russell

Research output: Contribution to conference (unpublished)Paperpeer-review


Automatic speech recognition (ASR) for children’s speech is
more difficult than for adults’ speech. This paper explores two
explanations of this phenomenon, namely (A) that it is due to
predictable phonological effects associated with language acquisition
in children, or (B) that it is due to the general increase
in acoustic variability that has been observed in children’s
speech. Phone recognition experiments are conducted on
hand labelled data for children aged between 5 and 6. A statistical
comparison of the resulting confusion matrix with that for
adult speech (TIMIT) shows significant increases in phone substitution
rates for children, some of which correspond to established
phonological phenomena (type A errors). However these
only account for a small proportion of errors, and those associated
with general acoustic variability (type B) appear to account
for the majority. The study also shows significantly more deletion
errors in ASR for children’s speech. Overall, the results
suggest that attempts to improve ASR accuracy for children’s
speech by accommodating phonological phenomena associated
with language acquisition, for example by changing the pronunciation
dictionary, are unlikely to deliver significant success in
the short term, and that coping with the increased acoustic variability
in children’s speech should be the immediate priority.
Original languageEnglish
Number of pages5
Publication statusPublished - 2015
EventSLaTE 2015 - Leipzig, Germany
Duration: 4 Sept 20155 Sept 2015


ConferenceSLaTE 2015


Dive into the research topics of 'Analysis of phone errors in computer recognition of children’s speech'. Together they form a unique fingerprint.

Cite this