Multi dialect Arabic speech parallel corpora

Khalid Almeman*, Mark Lee, Ali Abdulrahman Almiman

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

This paper describes the building of a multi dialect Arabic speech parallel corpus. It is designed to encompass four main dialects; Modern Standard Arabic (MSA), Gulf, Egypt and Levantine dialects. We have chosen a specific linguistic domain to work with it: travel and tourism. Parallel prompts were written for the four main dialects, which involved 1291 recordings for MSA and 1069 recordings for other dialects. The recordings were conducted with the consent of 52 participants. We have obtained about 32 speech hours. After the segmentation stage, we have obtained a total number of 67,132 speech files. These are the first Arabic parallel texts, and speech corpora and will be an open source for researchers.

Original languageEnglish
Title of host publication2013 1st International Conference on Communications, Signal Processing and Their Applications, ICCSPA 2013
DOIs
Publication statusPublished - 16 Apr 2013
Event2013 1st International Conference on Communications, Signal Processing and Their Applications, ICCSPA 2013 - Sharjah, United Arab Emirates
Duration: 12 Feb 201314 Feb 2013

Conference

Conference2013 1st International Conference on Communications, Signal Processing and Their Applications, ICCSPA 2013
Country/TerritoryUnited Arab Emirates
CitySharjah
Period12/02/1314/02/13

Keywords

  • Arabic Dialects
  • Multi-Dialect
  • Parallel
  • Speech Corpora

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Signal Processing

Fingerprint

Dive into the research topics of 'Multi dialect Arabic speech parallel corpora'. Together they form a unique fingerprint.

Cite this