Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features

Linxue Bai, Philip Weber, Peter Jancovic, Martin Russell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)
386 Downloads (Pure)

Abstract

Neural networks have a reputation for being "black boxes", which it has been suggested that techniques from user interface development and visualisation in particular, could help lift. In this paper, we explore 9-dimensional bottleneck features (BNFs) that have been shown in our earlier work to well represent speech in the context of speech recognition and 2-dimensional BNFs directly extracted from bottleneck neural networks. The 9-dimensional BNFs obtained from a phone classification neural network are visualised in 2-dimensional spaces using linear discriminant analysis (LDA) and t-distributed stochastic neighbour embedding (t-SNE). The 2-dimensional BNF space is analysed with regard to phonetic features. A back-propagation method is used to create "cardinal" features for each phone under a particular neural network. Both the visualisations of 9-dimensional and 2-dimensional BNFs show distinctions between most phone categories. Particularly, the 2-dimensional BNF space seems to be a union of phonetic category related subspaces that preserve local structures within each subspace where the organisations of phones appear to correspond to phone production mechanisms. By applying LDA to the features of higher dimensional non-bottleneck layers, we observe a triangular pattern which may indicate that silence, friction and voicing are the three main properties learned by the neural networks.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2018
Place of PublicationHyderabad, India
PublisherISCA
Pages1472-1476
Number of pages5
DOIs
Publication statusPublished - 3 Sept 2018
EventInterspeech 2018 - Hyderabad International Convention Centre, Hyderabad , India
Duration: 2 Sept 20186 Sept 2018

Publication series

NameInterspeech
Volume2018
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2018
Country/TerritoryIndia
CityHyderabad
Period2/09/186/09/18

Keywords

  • neural network
  • interpretation
  • visualisation
  • bottleneck features
  • phonetic features
  • phone classification

Fingerprint

Dive into the research topics of 'Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features'. Together they form a unique fingerprint.

Cite this