Abstract
In this paper, innovative experiments were done to
improve the Language Models (LMs) for three parallel dialects. In
each dialect, two different LMs were produced: a closed domain
LM and an open domain LM. The methodology of the second
part of the multi dialect morphology analyser, involved retrieval
of web frequencies for different parts of a word; this methodology
was modified and then used to extract the three suggested forms
of the word; stem alone, prefix+stem and stem+suffix. Six results
were then extracted per dialect, giving a total of eighteen results.
All the experiments yielded positive results, between 0.5% to
6.8% in WERs.
improve the Language Models (LMs) for three parallel dialects. In
each dialect, two different LMs were produced: a closed domain
LM and an open domain LM. The methodology of the second
part of the multi dialect morphology analyser, involved retrieval
of web frequencies for different parts of a word; this methodology
was modified and then used to extract the three suggested forms
of the word; stem alone, prefix+stem and stem+suffix. Six results
were then extracted per dialect, giving a total of eighteen results.
All the experiments yielded positive results, between 0.5% to
6.8% in WERs.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 5th International Conference on Arabic Language Processing |
| Subtitle of host publication | (CITALA '14) |
| Publisher | CITALA |
| Pages | 49-56 |
| Publication status | Published - Nov 2014 |
| Event | 5th International Conference on Arabic Language Processing (CITALA '14) - Oujda, Morocco Duration: 26 Nov 2014 → 27 Nov 2014 |
Conference
| Conference | 5th International Conference on Arabic Language Processing (CITALA '14) |
|---|---|
| Country/Territory | Morocco |
| City | Oujda |
| Period | 26/11/14 → 27/11/14 |