Phone classification using a non-linear manifold with broad phone class dependent DNNs

Linxue Bai, Peter Jancovic, Martin Russell, Philip Weber, Stephen Houghton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)
212 Downloads (Pure)

Abstract

Most state-of-the-art automatic speech recognition (ASR) systems use a single deep neural network (DNN) to map the acoustic space to the decision space. However, different phonetic classes employ different production mechanisms and are best described by different types of features. Hence it may be
advantageous to replace this single DNN with several phone class dependent DNNs. The appropriate mathematical formalism for this is a manifold. This paper assesses the use of a nonlinear manifold structure with multiple DNNs for phone classification. The system has two levels. The first comprises a set of broad phone class (BPC) dependent DNN-based mappings and the second level is a fusion network. Various ways of designing and training the networks in both levels are assessed, including varying the size of hidden layers, the use of the bottleneck or softmax outputs as input to the fusion network, and the use
of different broad class definitions. Phone classification experiments are performed on TIMIT. The results show that using the BPC-dependent DNNs provides small but significant improvements in phone classification accuracy relative to a single global DNN. The paper concludes with visualisations of the structures learned by the local and global DNNs and discussion of their
interpretations.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2017
Number of pages5
DOIs
Publication statusPublished - 24 Aug 2017
EventInterspeech 2017: Situated Interaction - Stockholm University, Stockholm, Sweden
Duration: 20 Aug 201724 Aug 2017
http://www.interspeech2017.org/

Conference

ConferenceInterspeech 2017
Country/TerritorySweden
CityStockholm
Period20/08/1724/08/17
Internet address

Fingerprint

Dive into the research topics of 'Phone classification using a non-linear manifold with broad phone class dependent DNNs'. Together they form a unique fingerprint.

Cite this