Phone recognition using a non-linear manifold with broad phone class dependent DNNs

Mengjie Qian, Linxue Bai, Peter Jancovic, Martin Russell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)
216 Downloads (Pure)


Although it is generally accepted that different broad phone classes (BPCs) have different production mechanisms and are better described by different types of features, most automatic speech recognition (ASR) systems use the same features and decision criteria for all phones. Motivated by this observation, this paper proposes a two-level DNN structure, referred to as a BPC-DNN, inspired by the notion of a topological manifold. In the first level, several small separate BPC-dependent DNNs are applied to different broad phonetic classes and in the second level the outputs of these DNNs are fused to obtain senone-dependent posterior probabilities, which can be used for frame level classification or integrated into Viterbi decoding for phone recognition. In a previous paper using this approach we reported improved frame classification accuracy on the TIMIT corpus compared with a conventional DNN. The contribution of the present paper is to demonstrate that this advantage extends to full phone recognition. Our most recent results show that the BPC-DNN achieves reductions in error rate relative to a conventional DNN of 16% and 8% for frame classification and phone recognition, respectively.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2018
Place of PublicationHyderabad, India
Number of pages5
Publication statusPublished - 3 Sept 2018
EventInterspeech 2018 - Hyderabad International Convention Centre, Hyderabad , India
Duration: 2 Sept 20186 Sept 2018

Publication series

ISSN (Electronic)1990-9772


ConferenceInterspeech 2018


  • manifold learning
  • phone classification
  • speech recognition
  • neural network
  • broad phone classes


Dive into the research topics of 'Phone recognition using a non-linear manifold with broad phone class dependent DNNs'. Together they form a unique fingerprint.

Cite this