Abstract
Automatic speech recognition (ASR) for children’s speech is
more difficult than for adults’ speech. A plausible explanation
is that ASR errors are due to predictable phonological effects
associated with language acquisition. We describe phone recognition
experiments on hand labelled data for children aged between
5 and 9. A comparison of the resulting confusion matrices
with those for adult speech (TIMIT) shows increased phone
substitution rates for children, which correspond to some extent
to established phonological phenomena. However these errors
still only account for a relatively small proportion of the issue.
This suggests that attempts to improve ASR accuracy on children’s
speech by accommodating these phenomena, for example
by changing the pronunciation dictionary, cannot solve the
whole problem.
more difficult than for adults’ speech. A plausible explanation
is that ASR errors are due to predictable phonological effects
associated with language acquisition. We describe phone recognition
experiments on hand labelled data for children aged between
5 and 9. A comparison of the resulting confusion matrices
with those for adult speech (TIMIT) shows increased phone
substitution rates for children, which correspond to some extent
to established phonological phenomena. However these errors
still only account for a relatively small proportion of the issue.
This suggests that attempts to improve ASR accuracy on children’s
speech by accommodating these phenomena, for example
by changing the pronunciation dictionary, cannot solve the
whole problem.
Original language | English |
---|---|
Number of pages | 4 |
Publication status | Published - 2015 |
Event | Interspeech 2015 - Dresden, Germany Duration: 6 Sept 2015 → 10 Sept 2015 |
Conference
Conference | Interspeech 2015 |
---|---|
Country/Territory | Germany |
City | Dresden |
Period | 6/09/15 → 10/09/15 |
Keywords
- children’s speech, phonological processes, automatic