Integrating character-level and word-level representation for affect in Arabic tweets

Abdullah I. Alharbi, Phillip Smith, Mark Lee

Research output: Contribution to journalArticlepeer-review

Abstract

Affect tasks, which range from sentiment polarity classification to finer grained sentiment strength and emotional intensity detection, have become of increasing interest due to the vast amount of user-generated content and advanced learning models. Word representation models have been leveraged effectively within a variety of natural language processing tasks. However, these models are not always effective in the context of social media. When dealing with social media posts in Arabic, the use of Arabic dialects needs to be considered. Although using informal text to train word-level models can lead to the identification of words that convey the same meaning, these models are unable to capture the full extent of the words that are used in the real world due to out-of-vocabulary (OOV) words. The inability to identify such words is one of the main limitations of word-level models. One approach of overcoming OOV is through the use of character-level embeddings as they can effectively learn the vectors of word parts or character n-grams. This study uses a combination of character-level and word-level models to identify the most effective methods by which affective Arabic words in tweets can be represented semantically and morphologically. We evaluate our generated models and the proposed method by integrating them in a supervised learning framework that was used for a range of affect tasks and other related tasks. Our findings reveal that the developed models surpassed the performance of state-of-the-art Arabic pre-trained word embeddings over eight datasets. In addition, our models enhance previous state-of-the-art outcomes on tasks involving Arabic emotion intensity, outperforming the top-systems that used advanced ensemble learning models and several additional features.

Original languageEnglish
Article number101973
Number of pages14
JournalData and Knowledge Engineering
Volume138
Early online date6 Jan 2022
DOIs
Publication statusPublished - Mar 2022

Bibliographical note

Publisher Copyright:
© 2022 Elsevier B.V.

Keywords

  • Affect tasks
  • Arabic tweets
  • Character-level embeddings
  • Word-level embeddings

ASJC Scopus subject areas

  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Integrating character-level and word-level representation for affect in Arabic tweets'. Together they form a unique fingerprint.

Cite this