Combining character and word embeddings for affect in arabic informal social media microblogs

Abdullah I. Alharbi*, Mark Lee

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Word representation models have been successfully applied in many natural language processing tasks, including sentiment analysis. However, these models do not always work effectively in some social media contexts. When considering the use of Arabic in microblogs like Twitter, it is important to note that a variety of different linguistic domains are involved. This is mainly because social media users employ various dialects in their communications. While training word-level models with such informal text can lead to words being captured that have the same meanings, these models cannot capture all words that can be encountered in the real world due to out-of-vocabulary (OOV) words. The inability to identify words is one of the main limitations of this word-level model. In contrast, character-level embeddings can work effectively with this problem through their ability to learn the vectors of character n-grams or parts of words. We take advantage of both character- and word-level models to discover more effective methods to represent Arabic affect words in tweets. We evaluate our embeddings by incorporating them into a supervised learning framework for a range of affect tasks. Our models outperform the state-of-the-art Arabic pre-trained word embeddings in these tasks. Moreover, they offer improved state-of-the-art results for the task of Arabic emotion intensity, outperforming the top-performing systems that employ a combination of deep neural networks and several other features.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Proceedings
EditorsElisabeth Métais, Farid Meziane, Helmut Horacek, Philipp Cimiano
PublisherSpringer Vieweg
Pages213-224
Number of pages12
ISBN (Print)9783030513092
DOIs
Publication statusPublished - 2020
Event25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020 - Saarbrücken, Germany
Duration: 24 Jun 202026 Jun 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12089 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020
Country/TerritoryGermany
CitySaarbrücken
Period24/06/2026/06/20

Bibliographical note

Publisher Copyright:
© Springer Nature Switzerland AG 2020.

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Keywords

  • Arabic affect tweets
  • Character-level embeddings
  • Word-level embeddings

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Combining character and word embeddings for affect in arabic informal social media microblogs'. Together they form a unique fingerprint.

Cite this