Abstract
This paper describes the building of a multi dialect Arabic speech parallel corpus. It is designed to encompass four main dialects; Modern Standard Arabic (MSA), Gulf, Egypt and Levantine dialects. We have chosen a specific linguistic domain to work with it: travel and tourism. Parallel prompts were written for the four main dialects, which involved 1291 recordings for MSA and 1069 recordings for other dialects. The recordings were conducted with the consent of 52 participants. We have obtained about 32 speech hours. After the segmentation stage, we have obtained a total number of 67,132 speech files. These are the first Arabic parallel texts, and speech corpora and will be an open source for researchers.
Original language | English |
---|---|
Title of host publication | 2013 1st International Conference on Communications, Signal Processing and Their Applications, ICCSPA 2013 |
DOIs | |
Publication status | Published - 16 Apr 2013 |
Event | 2013 1st International Conference on Communications, Signal Processing and Their Applications, ICCSPA 2013 - Sharjah, United Arab Emirates Duration: 12 Feb 2013 → 14 Feb 2013 |
Conference
Conference | 2013 1st International Conference on Communications, Signal Processing and Their Applications, ICCSPA 2013 |
---|---|
Country/Territory | United Arab Emirates |
City | Sharjah |
Period | 12/02/13 → 14/02/13 |
Keywords
- Arabic Dialects
- Multi-Dialect
- Parallel
- Speech Corpora
ASJC Scopus subject areas
- Computer Networks and Communications
- Signal Processing