With a view to using an articulatory representation in automatic recognition of conversational speech, two nonlinear methods for mapping from formants to short-term spectra were investigated: multilayered perceptrons (MLPs), and radial basis function (RBF) networks. Five schemes for dividing the TIMIT data according to their phone class were tested. The r.m.s. error of the RBF networks was 10%, less than that of the MLP, and the scheme based on discrete articulatory regions gave the greatest improvements over a single network.
|Number of pages||3|
|Publication status||Published - 1 Jan 2002|