Abstract
Automatic speech recognition (ASR) for children’s speech is
more difficult than for adults’ speech. This paper explores two
explanations of this phenomenon, namely (A) that it is due to
predictable phonological effects associated with language acquisition
in children, or (B) that it is due to the general increase
in acoustic variability that has been observed in children’s
speech. Phone recognition experiments are conducted on
hand labelled data for children aged between 5 and 6. A statistical
comparison of the resulting confusion matrix with that for
adult speech (TIMIT) shows significant increases in phone substitution
rates for children, some of which correspond to established
phonological phenomena (type A errors). However these
only account for a small proportion of errors, and those associated
with general acoustic variability (type B) appear to account
for the majority. The study also shows significantly more deletion
errors in ASR for children’s speech. Overall, the results
suggest that attempts to improve ASR accuracy for children’s
speech by accommodating phonological phenomena associated
with language acquisition, for example by changing the pronunciation
dictionary, are unlikely to deliver significant success in
the short term, and that coping with the increased acoustic variability
in children’s speech should be the immediate priority.
more difficult than for adults’ speech. This paper explores two
explanations of this phenomenon, namely (A) that it is due to
predictable phonological effects associated with language acquisition
in children, or (B) that it is due to the general increase
in acoustic variability that has been observed in children’s
speech. Phone recognition experiments are conducted on
hand labelled data for children aged between 5 and 6. A statistical
comparison of the resulting confusion matrix with that for
adult speech (TIMIT) shows significant increases in phone substitution
rates for children, some of which correspond to established
phonological phenomena (type A errors). However these
only account for a small proportion of errors, and those associated
with general acoustic variability (type B) appear to account
for the majority. The study also shows significantly more deletion
errors in ASR for children’s speech. Overall, the results
suggest that attempts to improve ASR accuracy for children’s
speech by accommodating phonological phenomena associated
with language acquisition, for example by changing the pronunciation
dictionary, are unlikely to deliver significant success in
the short term, and that coping with the increased acoustic variability
in children’s speech should be the immediate priority.
Original language | English |
---|---|
Number of pages | 5 |
Publication status | Published - 2015 |
Event | SLaTE 2015 - Leipzig, Germany Duration: 4 Sept 2015 → 5 Sept 2015 |
Conference
Conference | SLaTE 2015 |
---|---|
Country/Territory | Germany |
City | Leipzig |
Period | 4/09/15 → 5/09/15 |