Automatic speaker, age-group and gender identification from children's speech

Saeid Safavi; Martin Russell; Peter Jancovic

doi:10.1016/j.csl.2018.01.001

Automatic speaker, age-group and gender identification from children's speech

Saeid Safavi, Martin Russell, Peter Jancovic

Electronic, Electrical and Systems Engineering

Research output: Contribution to journal › Article › peer-review

30 Citations (Scopus)

447 Downloads (Pure)

Abstract

A speech signal contains important paralinguistic information, such as the identity, age, gender, language, accent, and the emotional state of the speaker. Automatic recognition of these types of information in adults' speech has received considerable attention, however there has been little work on children's speech. This paper focuses on speaker, gender, and age-group recognition from children's speech. The performances of several classification methods are compared, including Gaussian Mixture Model - Universal Background Model (GMM-UBM), GMM - Support Vector Machine (GMM-SVM) and i-vector based approaches. For speaker recognition, error rate decreases as age increases, as one might expect. However for gender and age-group recognition the effect of age is more complex due mainly to consequences of the onset of puberty. Finally, the utility of different frequency bands for speaker, age-group and gender recognition from children's speech is assessed.

Original language	English
Pages (from-to)	141-156
Number of pages	16
Journal	Computer Speech and Language
Volume	50
Early online date	9 Jan 2018
DOIs	https://doi.org/10.1016/j.csl.2018.01.001
Publication status	Published - Jul 2018

Keywords

speaker recognition
gender identification
children's speech
age-group identification
Guassian Mixture Model
Support Vector Machine
i-vector

Access to Document

10.1016/j.csl.2018.01.001Licence: None: All rights reserved

Safavi_et_al_Automatic_speaker_Computer_Speech_&_Language_2018Accepted author manuscript, 698 KBLicence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

Cite this

@article{e186fe70d6aa4a4582de34642f79d77a,

title = "Automatic speaker, age-group and gender identification from children's speech",

abstract = "A speech signal contains important paralinguistic information, such as the identity, age, gender, language, accent, and the emotional state of the speaker. Automatic recognition of these types of information in adults' speech has received considerable attention, however there has been little work on children's speech. This paper focuses on speaker, gender, and age-group recognition from children's speech. The performances of several classification methods are compared, including Gaussian Mixture Model - Universal Background Model (GMM-UBM), GMM - Support Vector Machine (GMM-SVM) and i-vector based approaches. For speaker recognition, error rate decreases as age increases, as one might expect. However for gender and age-group recognition the effect of age is more complex due mainly to consequences of the onset of puberty. Finally, the utility of different frequency bands for speaker, age-group and gender recognition from children's speech is assessed.",

keywords = "speaker recognition, gender identification, children's speech, age-group identification, Guassian Mixture Model, Support Vector Machine, i-vector",

author = "Saeid Safavi and Martin Russell and Peter Jancovic",

year = "2018",

month = jul,

doi = "10.1016/j.csl.2018.01.001",

language = "English",

volume = "50",

pages = "141--156",

journal = "Computer Speech and Language",

issn = "0885-2308",

publisher = "Elsevier",

}

TY - JOUR

T1 - Automatic speaker, age-group and gender identification from children's speech

AU - Safavi, Saeid

AU - Russell, Martin

AU - Jancovic, Peter

PY - 2018/7

Y1 - 2018/7

N2 - A speech signal contains important paralinguistic information, such as the identity, age, gender, language, accent, and the emotional state of the speaker. Automatic recognition of these types of information in adults' speech has received considerable attention, however there has been little work on children's speech. This paper focuses on speaker, gender, and age-group recognition from children's speech. The performances of several classification methods are compared, including Gaussian Mixture Model - Universal Background Model (GMM-UBM), GMM - Support Vector Machine (GMM-SVM) and i-vector based approaches. For speaker recognition, error rate decreases as age increases, as one might expect. However for gender and age-group recognition the effect of age is more complex due mainly to consequences of the onset of puberty. Finally, the utility of different frequency bands for speaker, age-group and gender recognition from children's speech is assessed.

AB - A speech signal contains important paralinguistic information, such as the identity, age, gender, language, accent, and the emotional state of the speaker. Automatic recognition of these types of information in adults' speech has received considerable attention, however there has been little work on children's speech. This paper focuses on speaker, gender, and age-group recognition from children's speech. The performances of several classification methods are compared, including Gaussian Mixture Model - Universal Background Model (GMM-UBM), GMM - Support Vector Machine (GMM-SVM) and i-vector based approaches. For speaker recognition, error rate decreases as age increases, as one might expect. However for gender and age-group recognition the effect of age is more complex due mainly to consequences of the onset of puberty. Finally, the utility of different frequency bands for speaker, age-group and gender recognition from children's speech is assessed.

KW - speaker recognition

KW - gender identification

KW - children's speech

KW - age-group identification

KW - Guassian Mixture Model

KW - Support Vector Machine

KW - i-vector

U2 - 10.1016/j.csl.2018.01.001

DO - 10.1016/j.csl.2018.01.001

M3 - Article

SN - 0885-2308

VL - 50

SP - 141

EP - 156

JO - Computer Speech and Language

JF - Computer Speech and Language

ER -

Automatic speaker, age-group and gender identification from children's speech

Abstract

Keywords

Access to Document

Fingerprint

Cite this