Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features

Linxue Bai; Philip Weber; Peter Jancovic; Martin Russell

doi:10.21437/Interspeech.2018-2462

Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features

Linxue Bai, Philip Weber, Peter Jancovic, Martin Russell

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Citations (Scopus)

386 Downloads (Pure)

Abstract

Neural networks have a reputation for being "black boxes", which it has been suggested that techniques from user interface development and visualisation in particular, could help lift. In this paper, we explore 9-dimensional bottleneck features (BNFs) that have been shown in our earlier work to well represent speech in the context of speech recognition and 2-dimensional BNFs directly extracted from bottleneck neural networks. The 9-dimensional BNFs obtained from a phone classification neural network are visualised in 2-dimensional spaces using linear discriminant analysis (LDA) and t-distributed stochastic neighbour embedding (t-SNE). The 2-dimensional BNF space is analysed with regard to phonetic features. A back-propagation method is used to create "cardinal" features for each phone under a particular neural network. Both the visualisations of 9-dimensional and 2-dimensional BNFs show distinctions between most phone categories. Particularly, the 2-dimensional BNF space seems to be a union of phonetic category related subspaces that preserve local structures within each subspace where the organisations of phones appear to correspond to phone production mechanisms. By applying LDA to the features of higher dimensional non-bottleneck layers, we observe a triangular pattern which may indicate that silence, friction and voicing are the three main properties learned by the neural networks.

Original language	English
Title of host publication	Proceedings of Interspeech 2018
Place of Publication	Hyderabad, India
Publisher	ISCA
Pages	1472-1476
Number of pages	5
DOIs	https://doi.org/10.21437/Interspeech.2018-2462
Publication status	Published - 3 Sept 2018
Event	Interspeech 2018 - Hyderabad International Convention Centre, Hyderabad , India Duration: 2 Sept 2018 → 6 Sept 2018

Publication series

Name	Interspeech
Volume	2018
ISSN (Electronic)	1990-9772

Conference

Conference	Interspeech 2018
Country/Territory	India
City	Hyderabad
Period	2/09/18 → 6/09/18

Keywords

neural network
interpretation
visualisation
bottleneck features
phonetic features
phone classification

Access to Document

10.21437/Interspeech.2018-2462Licence: None: All rights reserved

Linxue_Bai_et_al_Exploring_how_phone_classification_Proc_Interspeech_2018
Checked for eligibility: 13/09/2018 Bai, L., Weber, P., Jančovič, P., Russell, M. (2018) Exploring How Phone Classification Neural Networks Learn Phonetic Information by Visualising and Interpreting Bottleneck Features. Proc. Interspeech 2018, 1472-1476, DOI: 10.21437/Interspeech.2018-2462.
Final published version, 980 KBLicence: Other (please specify with Rights Statement)

Cite this

@inproceedings{ce96eee5cc0c4aac941160cde80e5499,

title = "Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features",

abstract = "Neural networks have a reputation for being {"}black boxes{"}, which it has been suggested that techniques from user interface development and visualisation in particular, could help lift. In this paper, we explore 9-dimensional bottleneck features (BNFs) that have been shown in our earlier work to well represent speech in the context of speech recognition and 2-dimensional BNFs directly extracted from bottleneck neural networks. The 9-dimensional BNFs obtained from a phone classification neural network are visualised in 2-dimensional spaces using linear discriminant analysis (LDA) and t-distributed stochastic neighbour embedding (t-SNE). The 2-dimensional BNF space is analysed with regard to phonetic features. A back-propagation method is used to create {"}cardinal{"} features for each phone under a particular neural network. Both the visualisations of 9-dimensional and 2-dimensional BNFs show distinctions between most phone categories. Particularly, the 2-dimensional BNF space seems to be a union of phonetic category related subspaces that preserve local structures within each subspace where the organisations of phones appear to correspond to phone production mechanisms. By applying LDA to the features of higher dimensional non-bottleneck layers, we observe a triangular pattern which may indicate that silence, friction and voicing are the three main properties learned by the neural networks.",

keywords = "neural network, interpretation, visualisation, bottleneck features, phonetic features, phone classification",

author = "Linxue Bai and Philip Weber and Peter Jancovic and Martin Russell",

year = "2018",

month = sep,

day = "3",

doi = "10.21437/Interspeech.2018-2462",

language = "English",

series = "Interspeech",

publisher = "ISCA",

pages = "1472--1476",

booktitle = "Proceedings of Interspeech 2018",

note = "Interspeech 2018 ; Conference date: 02-09-2018 Through 06-09-2018",

}

Bai, L, Weber, P, Jancovic, P & Russell, M 2018, Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features. in Proceedings of Interspeech 2018. Interspeech, vol. 2018, ISCA, Hyderabad, India, pp. 1472-1476, Interspeech 2018, Hyderabad , India, 2/09/18. https://doi.org/10.21437/Interspeech.2018-2462

Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features. / Bai, Linxue; Weber, Philip; Jancovic, Peter et al.
Proceedings of Interspeech 2018. Hyderabad, India: ISCA, 2018. p. 1472-1476 (Interspeech; Vol. 2018).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features

AU - Bai, Linxue

AU - Weber, Philip

AU - Jancovic, Peter

AU - Russell, Martin

PY - 2018/9/3

Y1 - 2018/9/3

N2 - Neural networks have a reputation for being "black boxes", which it has been suggested that techniques from user interface development and visualisation in particular, could help lift. In this paper, we explore 9-dimensional bottleneck features (BNFs) that have been shown in our earlier work to well represent speech in the context of speech recognition and 2-dimensional BNFs directly extracted from bottleneck neural networks. The 9-dimensional BNFs obtained from a phone classification neural network are visualised in 2-dimensional spaces using linear discriminant analysis (LDA) and t-distributed stochastic neighbour embedding (t-SNE). The 2-dimensional BNF space is analysed with regard to phonetic features. A back-propagation method is used to create "cardinal" features for each phone under a particular neural network. Both the visualisations of 9-dimensional and 2-dimensional BNFs show distinctions between most phone categories. Particularly, the 2-dimensional BNF space seems to be a union of phonetic category related subspaces that preserve local structures within each subspace where the organisations of phones appear to correspond to phone production mechanisms. By applying LDA to the features of higher dimensional non-bottleneck layers, we observe a triangular pattern which may indicate that silence, friction and voicing are the three main properties learned by the neural networks.

AB - Neural networks have a reputation for being "black boxes", which it has been suggested that techniques from user interface development and visualisation in particular, could help lift. In this paper, we explore 9-dimensional bottleneck features (BNFs) that have been shown in our earlier work to well represent speech in the context of speech recognition and 2-dimensional BNFs directly extracted from bottleneck neural networks. The 9-dimensional BNFs obtained from a phone classification neural network are visualised in 2-dimensional spaces using linear discriminant analysis (LDA) and t-distributed stochastic neighbour embedding (t-SNE). The 2-dimensional BNF space is analysed with regard to phonetic features. A back-propagation method is used to create "cardinal" features for each phone under a particular neural network. Both the visualisations of 9-dimensional and 2-dimensional BNFs show distinctions between most phone categories. Particularly, the 2-dimensional BNF space seems to be a union of phonetic category related subspaces that preserve local structures within each subspace where the organisations of phones appear to correspond to phone production mechanisms. By applying LDA to the features of higher dimensional non-bottleneck layers, we observe a triangular pattern which may indicate that silence, friction and voicing are the three main properties learned by the neural networks.

KW - neural network

KW - interpretation

KW - visualisation

KW - bottleneck features

KW - phonetic features

KW - phone classification

U2 - 10.21437/Interspeech.2018-2462

DO - 10.21437/Interspeech.2018-2462

M3 - Conference contribution

T3 - Interspeech

SP - 1472

EP - 1476

BT - Proceedings of Interspeech 2018

PB - ISCA

CY - Hyderabad, India

T2 - Interspeech 2018

Y2 - 2 September 2018 through 6 September 2018

ER -

Exploring how phone classification neural networks learn phonetic information by visualising and interpreting bottleneck features

Abstract

Publication series

Conference

Keywords

Access to Document

Fingerprint

Cite this