Morpheme-based Language Models for Improving the Speech Recognition of Arabic Dialects

Khalid Almeman; Mark Lee

Morpheme-based Language Models for Improving the Speech Recognition of Arabic Dialects

Khalid Almeman, Mark Lee

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Downloads (Pure)

Abstract

In this paper, innovative experiments were done to
improve the Language Models (LMs) for three parallel dialects. In
each dialect, two different LMs were produced: a closed domain
LM and an open domain LM. The methodology of the second
part of the multi dialect morphology analyser, involved retrieval
of web frequencies for different parts of a word; this methodology
was modified and then used to extract the three suggested forms
of the word; stem alone, prefix+stem and stem+suffix. Six results
were then extracted per dialect, giving a total of eighteen results.
All the experiments yielded positive results, between 0.5% to
6.8% in WERs.

Original language	English
Title of host publication	Proceedings of the 5th International Conference on Arabic Language Processing
Subtitle of host publication	(CITALA '14)
Publisher	CITALA
Pages	49-56
Publication status	Published - Nov 2014
Event	5th International Conference on Arabic Language Processing (CITALA '14) - Oujda, Morocco Duration: 26 Nov 2014 → 27 Nov 2014

Conference

Conference	5th International Conference on Arabic Language Processing (CITALA '14)
Country/Territory	Morocco
City	Oujda
Period	26/11/14 → 27/11/14

Cite this

@inproceedings{7fbefcd742d34a369425f3ef471c6529,

title = "Morpheme-based Language Models for Improving the Speech Recognition of Arabic Dialects",

abstract = "In this paper, innovative experiments were done toimprove the Language Models (LMs) for three parallel dialects. Ineach dialect, two different LMs were produced: a closed domainLM and an open domain LM. The methodology of the secondpart of the multi dialect morphology analyser, involved retrievalof web frequencies for different parts of a word; this methodologywas modified and then used to extract the three suggested formsof the word; stem alone, prefix+stem and stem+suffix. Six resultswere then extracted per dialect, giving a total of eighteen results.All the experiments yielded positive results, between 0.5% to6.8% in WERs.",

author = "Khalid Almeman and Mark Lee",

year = "2014",

month = nov,

language = "English",

pages = "49--56",

booktitle = "Proceedings of the 5th International Conference on Arabic Language Processing",

publisher = "CITALA",

note = "5th International Conference on Arabic Language Processing (CITALA '14) ; Conference date: 26-11-2014 Through 27-11-2014",

}

TY - GEN

T1 - Morpheme-based Language Models for Improving the Speech Recognition of Arabic Dialects

AU - Almeman, Khalid

AU - Lee, Mark

PY - 2014/11

Y1 - 2014/11

N2 - In this paper, innovative experiments were done toimprove the Language Models (LMs) for three parallel dialects. Ineach dialect, two different LMs were produced: a closed domainLM and an open domain LM. The methodology of the secondpart of the multi dialect morphology analyser, involved retrievalof web frequencies for different parts of a word; this methodologywas modified and then used to extract the three suggested formsof the word; stem alone, prefix+stem and stem+suffix. Six resultswere then extracted per dialect, giving a total of eighteen results.All the experiments yielded positive results, between 0.5% to6.8% in WERs.

AB - In this paper, innovative experiments were done toimprove the Language Models (LMs) for three parallel dialects. Ineach dialect, two different LMs were produced: a closed domainLM and an open domain LM. The methodology of the secondpart of the multi dialect morphology analyser, involved retrievalof web frequencies for different parts of a word; this methodologywas modified and then used to extract the three suggested formsof the word; stem alone, prefix+stem and stem+suffix. Six resultswere then extracted per dialect, giving a total of eighteen results.All the experiments yielded positive results, between 0.5% to6.8% in WERs.

UR - http://www.citala.org/citala2014/acceptedpapers.html

UR - http://www.citala.org/icalp2017/#preEd

M3 - Conference contribution

SP - 49

EP - 56

BT - Proceedings of the 5th International Conference on Arabic Language Processing

PB - CITALA

T2 - 5th International Conference on Arabic Language Processing (CITALA '14)

Y2 - 26 November 2014 through 27 November 2014

ER -

Morpheme-based Language Models for Improving the Speech Recognition of Arabic Dialects

Abstract

Conference

Fingerprint

Cite this