Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Neural networks have a reputation for being "black boxes", which it has been suggested that techniques from user interface development and visualisation in particular, could help lift. In this paper, we explore 9-dimensional bottleneck features (BNFs) that have been shown in our earlier work to well represent speech in the context of speech recognition and 2-dimensional BNFs directly extracted from bottleneck neural networks. The 9-dimensional BNFs obtained from a phone classification neural network are visualised in 2-dimensional spaces using linear discriminant analysis (LDA) and t-distributed stochastic neighbour embedding (t-SNE). The 2-dimensional BNF space is analysed with regard to phonetic features. A back-propagation method is used to create "cardinal" features for each phone under a particular neural network. Both the visualisations of 9-dimensional and 2-dimensional BNFs show distinctions between most phone categories. Particularly, the 2-dimensional BNF space seems to be a union of phonetic category related subspaces that preserve local structures within each subspace where the organisations of phones appear to correspond to phone production mechanisms. By applying LDA to the features of higher dimensional non-bottleneck layers, we observe a triangular pattern which may indicate that silence, friction and voicing are the three main properties learned by the neural networks.

Details

Original languageEnglish
Title of host publicationProceedings of Interspeech 2018
Publication statusPublished - 3 Sep 2018
EventInterspeech 2018 - Hyderabad International Convention Centre, Hyderabad , India
Duration: 2 Sep 20186 Sep 2018

Publication series

NameInterspeech
Volume2018
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2018
CountryIndia
CityHyderabad
Period2/09/186/09/18

Keywords

  • neural network, interpretation, visualisation, bottleneck features, phonetic features, phone classification