Combining character and word embeddings for affect in arabic informal social media microblogs

Abdullah I. Alharbi; Mark Lee

doi:10.1007/978-3-030-51310-8_20

Combining character and word embeddings for affect in arabic informal social media microblogs

Abdullah I. Alharbi^*, Mark Lee

^*Corresponding author for this work

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Citation (Scopus)

Abstract

Word representation models have been successfully applied in many natural language processing tasks, including sentiment analysis. However, these models do not always work effectively in some social media contexts. When considering the use of Arabic in microblogs like Twitter, it is important to note that a variety of different linguistic domains are involved. This is mainly because social media users employ various dialects in their communications. While training word-level models with such informal text can lead to words being captured that have the same meanings, these models cannot capture all words that can be encountered in the real world due to out-of-vocabulary (OOV) words. The inability to identify words is one of the main limitations of this word-level model. In contrast, character-level embeddings can work effectively with this problem through their ability to learn the vectors of character n-grams or parts of words. We take advantage of both character- and word-level models to discover more effective methods to represent Arabic affect words in tweets. We evaluate our embeddings by incorporating them into a supervised learning framework for a range of affect tasks. Our models outperform the state-of-the-art Arabic pre-trained word embeddings in these tasks. Moreover, they offer improved state-of-the-art results for the task of Arabic emotion intensity, outperforming the top-performing systems that employ a combination of deep neural networks and several other features.

Original language	English
Title of host publication	Natural Language Processing and Information Systems - 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Proceedings
Editors	Elisabeth Métais, Farid Meziane, Helmut Horacek, Philipp Cimiano
Publisher	Springer Vieweg
Pages	213-224
Number of pages	12
ISBN (Print)	9783030513092
DOIs	https://doi.org/10.1007/978-3-030-51310-8_20
Publication status	Published - 2020
Event	25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020 - Saarbrücken, Germany Duration: 24 Jun 2020 → 26 Jun 2020

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	12089 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020
Country/Territory	Germany
City	Saarbrücken
Period	24/06/20 → 26/06/20

Bibliographical note

Publisher Copyright:
© Springer Nature Switzerland AG 2020.

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Keywords

Arabic affect tweets
Character-level embeddings
Word-level embeddings

ASJC Scopus subject areas

Theoretical Computer Science
Computer Science(all)

Access to Document

10.1007/978-3-030-51310-8_20

Cite this

Alharbi, A. I., & Lee, M. (2020). Combining character and word embeddings for affect in arabic informal social media microblogs. In E. Métais, F. Meziane, H. Horacek, & P. Cimiano (Eds.), Natural Language Processing and Information Systems - 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Proceedings (pp. 213-224). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12089 LNCS). Springer Vieweg. https://doi.org/10.1007/978-3-030-51310-8_20

Alharbi, Abdullah I. ; Lee, Mark. / Combining character and word embeddings for affect in arabic informal social media microblogs. Natural Language Processing and Information Systems - 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Proceedings. editor / Elisabeth Métais ; Farid Meziane ; Helmut Horacek ; Philipp Cimiano. Springer Vieweg, 2020. pp. 213-224 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{907a660276f64fd785a5c4c1bbaeafcc,

title = "Combining character and word embeddings for affect in arabic informal social media microblogs",

abstract = "Word representation models have been successfully applied in many natural language processing tasks, including sentiment analysis. However, these models do not always work effectively in some social media contexts. When considering the use of Arabic in microblogs like Twitter, it is important to note that a variety of different linguistic domains are involved. This is mainly because social media users employ various dialects in their communications. While training word-level models with such informal text can lead to words being captured that have the same meanings, these models cannot capture all words that can be encountered in the real world due to out-of-vocabulary (OOV) words. The inability to identify words is one of the main limitations of this word-level model. In contrast, character-level embeddings can work effectively with this problem through their ability to learn the vectors of character n-grams or parts of words. We take advantage of both character- and word-level models to discover more effective methods to represent Arabic affect words in tweets. We evaluate our embeddings by incorporating them into a supervised learning framework for a range of affect tasks. Our models outperform the state-of-the-art Arabic pre-trained word embeddings in these tasks. Moreover, they offer improved state-of-the-art results for the task of Arabic emotion intensity, outperforming the top-performing systems that employ a combination of deep neural networks and several other features.",

keywords = "Arabic affect tweets, Character-level embeddings, Word-level embeddings",

author = "Alharbi, {Abdullah I.} and Mark Lee",

note = "Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2020. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.; 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020 ; Conference date: 24-06-2020 Through 26-06-2020",

year = "2020",

doi = "10.1007/978-3-030-51310-8_20",

language = "English",

isbn = "9783030513092",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Vieweg",

pages = "213--224",

editor = "Elisabeth M{\'e}tais and Farid Meziane and Helmut Horacek and Philipp Cimiano",

booktitle = "Natural Language Processing and Information Systems - 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Proceedings",

address = "Germany",

}

Alharbi, AI & Lee, M 2020, Combining character and word embeddings for affect in arabic informal social media microblogs. in E Métais, F Meziane, H Horacek & P Cimiano (eds), Natural Language Processing and Information Systems - 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12089 LNCS, Springer Vieweg, pp. 213-224, 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Saarbrücken, Germany, 24/06/20. https://doi.org/10.1007/978-3-030-51310-8_20

Combining character and word embeddings for affect in arabic informal social media microblogs. / Alharbi, Abdullah I.; Lee, Mark.
Natural Language Processing and Information Systems - 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Proceedings. ed. / Elisabeth Métais; Farid Meziane; Helmut Horacek; Philipp Cimiano. Springer Vieweg, 2020. p. 213-224 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12089 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Combining character and word embeddings for affect in arabic informal social media microblogs

AU - Alharbi, Abdullah I.

AU - Lee, Mark

PY - 2020

Y1 - 2020

N2 - Word representation models have been successfully applied in many natural language processing tasks, including sentiment analysis. However, these models do not always work effectively in some social media contexts. When considering the use of Arabic in microblogs like Twitter, it is important to note that a variety of different linguistic domains are involved. This is mainly because social media users employ various dialects in their communications. While training word-level models with such informal text can lead to words being captured that have the same meanings, these models cannot capture all words that can be encountered in the real world due to out-of-vocabulary (OOV) words. The inability to identify words is one of the main limitations of this word-level model. In contrast, character-level embeddings can work effectively with this problem through their ability to learn the vectors of character n-grams or parts of words. We take advantage of both character- and word-level models to discover more effective methods to represent Arabic affect words in tweets. We evaluate our embeddings by incorporating them into a supervised learning framework for a range of affect tasks. Our models outperform the state-of-the-art Arabic pre-trained word embeddings in these tasks. Moreover, they offer improved state-of-the-art results for the task of Arabic emotion intensity, outperforming the top-performing systems that employ a combination of deep neural networks and several other features.

AB - Word representation models have been successfully applied in many natural language processing tasks, including sentiment analysis. However, these models do not always work effectively in some social media contexts. When considering the use of Arabic in microblogs like Twitter, it is important to note that a variety of different linguistic domains are involved. This is mainly because social media users employ various dialects in their communications. While training word-level models with such informal text can lead to words being captured that have the same meanings, these models cannot capture all words that can be encountered in the real world due to out-of-vocabulary (OOV) words. The inability to identify words is one of the main limitations of this word-level model. In contrast, character-level embeddings can work effectively with this problem through their ability to learn the vectors of character n-grams or parts of words. We take advantage of both character- and word-level models to discover more effective methods to represent Arabic affect words in tweets. We evaluate our embeddings by incorporating them into a supervised learning framework for a range of affect tasks. Our models outperform the state-of-the-art Arabic pre-trained word embeddings in these tasks. Moreover, they offer improved state-of-the-art results for the task of Arabic emotion intensity, outperforming the top-performing systems that employ a combination of deep neural networks and several other features.

KW - Arabic affect tweets

KW - Character-level embeddings

KW - Word-level embeddings

UR - http://www.scopus.com/inward/record.url?scp=85087533073&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-51310-8_20

DO - 10.1007/978-3-030-51310-8_20

M3 - Conference contribution

AN - SCOPUS:85087533073

SN - 9783030513092

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 213

EP - 224

BT - Natural Language Processing and Information Systems - 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Proceedings

A2 - Métais, Elisabeth

A2 - Meziane, Farid

A2 - Horacek, Helmut

A2 - Cimiano, Philipp

PB - Springer Vieweg

T2 - 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020

Y2 - 24 June 2020 through 26 June 2020

ER -

Alharbi AI, Lee M. Combining character and word embeddings for affect in arabic informal social media microblogs. In Métais E, Meziane F, Horacek H, Cimiano P, editors, Natural Language Processing and Information Systems - 25th International Conference on Applications of Natural Language to Information Systems, NLDB 2020, Proceedings. Springer Vieweg. 2020. p. 213-224. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-51310-8_20

Combining character and word embeddings for affect in arabic informal social media microblogs

Abstract

Publication series

Conference

Bibliographical note

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this