Abstract
In this paper, innovative experiments were done to
improve the Language Models (LMs) for three parallel dialects. In
each dialect, two different LMs were produced: a closed domain
LM and an open domain LM. The methodology of the second
part of the multi dialect morphology analyser, involved retrieval
of web frequencies for different parts of a word; this methodology
was modified and then used to extract the three suggested forms
of the word; stem alone, prefix+stem and stem+suffix. Six results
were then extracted per dialect, giving a total of eighteen results.
All the experiments yielded positive results, between 0.5% to
6.8% in WERs.
improve the Language Models (LMs) for three parallel dialects. In
each dialect, two different LMs were produced: a closed domain
LM and an open domain LM. The methodology of the second
part of the multi dialect morphology analyser, involved retrieval
of web frequencies for different parts of a word; this methodology
was modified and then used to extract the three suggested forms
of the word; stem alone, prefix+stem and stem+suffix. Six results
were then extracted per dialect, giving a total of eighteen results.
All the experiments yielded positive results, between 0.5% to
6.8% in WERs.
Original language | English |
---|---|
Title of host publication | Proceedings of the 5th International Conference on Arabic Language Processing |
Subtitle of host publication | (CITALA '14) |
Publisher | CITALA |
Pages | 49-56 |
Publication status | Published - Nov 2014 |
Event | 5th International Conference on Arabic Language Processing (CITALA '14) - Oujda, Morocco Duration: 26 Nov 2014 → 27 Nov 2014 |
Conference
Conference | 5th International Conference on Arabic Language Processing (CITALA '14) |
---|---|
Country/Territory | Morocco |
City | Oujda |
Period | 26/11/14 → 27/11/14 |