Analysis of phone errors in computer recognition of children’s speech

Evangelia Fringi; Jill Lehman; Martin Russell

Analysis of phone errors in computer recognition of children’s speech

Evangelia Fringi, Jill Lehman, Martin Russell

Electronic, Electrical and Systems Engineering

Research output: Contribution to conference (unpublished) › Paper › peer-review

Abstract

Automatic speech recognition (ASR) for children’s speech is
more difficult than for adults’ speech. This paper explores two
explanations of this phenomenon, namely (A) that it is due to
predictable phonological effects associated with language acquisition
in children, or (B) that it is due to the general increase
in acoustic variability that has been observed in children’s
speech. Phone recognition experiments are conducted on
hand labelled data for children aged between 5 and 6. A statistical
comparison of the resulting confusion matrix with that for
adult speech (TIMIT) shows significant increases in phone substitution
rates for children, some of which correspond to established
phonological phenomena (type A errors). However these
only account for a small proportion of errors, and those associated
with general acoustic variability (type B) appear to account
for the majority. The study also shows significantly more deletion
errors in ASR for children’s speech. Overall, the results
suggest that attempts to improve ASR accuracy for children’s
speech by accommodating phonological phenomena associated
with language acquisition, for example by changing the pronunciation
dictionary, are unlikely to deliver significant success in
the short term, and that coping with the increased acoustic variability
in children’s speech should be the immediate priority.

Original language	English
Number of pages	5
Publication status	Published - 2015
Event	SLaTE 2015 - Leipzig, Germany Duration: 4 Sept 2015 → 5 Sept 2015

Conference

Conference	SLaTE 2015
Country/Territory	Germany
City	Leipzig
Period	4/09/15 → 5/09/15

Cite this

@conference{3ed50bcf7131410fa31ac69a707fa201,

title = "Analysis of phone errors in computer recognition of children{\textquoteright}s speech",

abstract = "Automatic speech recognition (ASR) for children{\textquoteright}s speech ismore difficult than for adults{\textquoteright} speech. This paper explores twoexplanations of this phenomenon, namely (A) that it is due topredictable phonological effects associated with language acquisitionin children, or (B) that it is due to the general increasein acoustic variability that has been observed in children{\textquoteright}sspeech. Phone recognition experiments are conducted onhand labelled data for children aged between 5 and 6. A statisticalcomparison of the resulting confusion matrix with that foradult speech (TIMIT) shows significant increases in phone substitutionrates for children, some of which correspond to establishedphonological phenomena (type A errors). However theseonly account for a small proportion of errors, and those associatedwith general acoustic variability (type B) appear to accountfor the majority. The study also shows significantly more deletionerrors in ASR for children{\textquoteright}s speech. Overall, the resultssuggest that attempts to improve ASR accuracy for children{\textquoteright}sspeech by accommodating phonological phenomena associatedwith language acquisition, for example by changing the pronunciationdictionary, are unlikely to deliver significant success inthe short term, and that coping with the increased acoustic variabilityin children{\textquoteright}s speech should be the immediate priority.",

author = "Evangelia Fringi and Jill Lehman and Martin Russell",

year = "2015",

language = "English",

note = "SLaTE 2015 ; Conference date: 04-09-2015 Through 05-09-2015",

}

TY - CONF

T1 - Analysis of phone errors in computer recognition of children’s speech

AU - Fringi, Evangelia

AU - Lehman, Jill

AU - Russell, Martin

PY - 2015

Y1 - 2015

N2 - Automatic speech recognition (ASR) for children’s speech ismore difficult than for adults’ speech. This paper explores twoexplanations of this phenomenon, namely (A) that it is due topredictable phonological effects associated with language acquisitionin children, or (B) that it is due to the general increasein acoustic variability that has been observed in children’sspeech. Phone recognition experiments are conducted onhand labelled data for children aged between 5 and 6. A statisticalcomparison of the resulting confusion matrix with that foradult speech (TIMIT) shows significant increases in phone substitutionrates for children, some of which correspond to establishedphonological phenomena (type A errors). However theseonly account for a small proportion of errors, and those associatedwith general acoustic variability (type B) appear to accountfor the majority. The study also shows significantly more deletionerrors in ASR for children’s speech. Overall, the resultssuggest that attempts to improve ASR accuracy for children’sspeech by accommodating phonological phenomena associatedwith language acquisition, for example by changing the pronunciationdictionary, are unlikely to deliver significant success inthe short term, and that coping with the increased acoustic variabilityin children’s speech should be the immediate priority.

AB - Automatic speech recognition (ASR) for children’s speech ismore difficult than for adults’ speech. This paper explores twoexplanations of this phenomenon, namely (A) that it is due topredictable phonological effects associated with language acquisitionin children, or (B) that it is due to the general increasein acoustic variability that has been observed in children’sspeech. Phone recognition experiments are conducted onhand labelled data for children aged between 5 and 6. A statisticalcomparison of the resulting confusion matrix with that foradult speech (TIMIT) shows significant increases in phone substitutionrates for children, some of which correspond to establishedphonological phenomena (type A errors). However theseonly account for a small proportion of errors, and those associatedwith general acoustic variability (type B) appear to accountfor the majority. The study also shows significantly more deletionerrors in ASR for children’s speech. Overall, the resultssuggest that attempts to improve ASR accuracy for children’sspeech by accommodating phonological phenomena associatedwith language acquisition, for example by changing the pronunciationdictionary, are unlikely to deliver significant success inthe short term, and that coping with the increased acoustic variabilityin children’s speech should be the immediate priority.

M3 - Paper

T2 - SLaTE 2015

Y2 - 4 September 2015 through 5 September 2015

ER -

Analysis of phone errors in computer recognition of children’s speech

Abstract

Conference

Fingerprint

Cite this