Phone recognition using a non-linear manifold with broad phone class dependent DNNs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Although it is generally accepted that different broad phone classes (BPCs) have different production mechanisms and are better described by different types of features, most automatic speech recognition (ASR) systems use the same features and decision criteria for all phones. Motivated by this observation, this paper proposes a two-level DNN structure, referred to as a BPC-DNN, inspired by the notion of a topological manifold. In the first level, several small separate BPC-dependent DNNs are applied to different broad phonetic classes and in the second level the outputs of these DNNs are fused to obtain senone-dependent posterior probabilities, which can be used for frame level classification or integrated into Viterbi decoding for phone recognition. In a previous paper using this approach we reported improved frame classification accuracy on the TIMIT corpus compared with a conventional DNN. The contribution of the present paper is to demonstrate that this advantage extends to full phone recognition. Our most recent results show that the BPC-DNN achieves reductions in error rate relative to a conventional DNN of 16% and 8% for frame classification and phone recognition, respectively.

Details

Original languageEnglish
Title of host publicationProceedings of Interspeech 2018
Publication statusPublished - 3 Sep 2018
EventInterspeech 2018 - Hyderabad International Convention Centre, Hyderabad , India
Duration: 2 Sep 20186 Sep 2018

Publication series

NameInterspeech
Volume2018
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2018
CountryIndia
CityHyderabad
Period2/09/186/09/18

Keywords

  • manifold learning, phone classification, speech recognition, neural network, broad phone classes