CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

Frances Adriana Laureano De Leon, Florimond Gueniat, Harish Tayyar Madabushi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching. While recent research into code-switched posts has focused on the use of multilingual word embeddings, these embeddings were not trained on code-switched data. In this work, we present word-embeddings trained on code-switched tweets, specifically those that make use of Spanish and English, known as Spanglish. We explore the embedding space to discover how they capture the meanings of words in both languages. We test the effectiveness of these embeddings by participating in SemEval 2020 Task 9: ~\emph{Sentiment Analysis on Code-Mixed Social Media Text}. We utilised them to train a sentiment classifier that achieves an F-1 score of 0.722. This is higher than the baseline for the competition of 0.656, with our team (codalab username \emph{francesita}) ranking 14 out of 29 participating teams, beating the baseline.
Original languageEnglish
Title of host publicationProceedings of the Fourteenth Workshop on Semantic Evaluation
PublisherInternational Committee on Computational Linguistics
Pages922
Number of pages927
Publication statusPublished - 8 Dec 2020
EventProceedings of the Fourteenth Workshop on Semantic Evaluation - Barcelona, Spain
Duration: 12 Dec 202013 Dec 2020

Publication series

NameProceedings of the Fourteenth Workshop on Semantic Evaluation

Workshop

WorkshopProceedings of the Fourteenth Workshop on Semantic Evaluation
Abbreviated titleSemEval
Country/TerritorySpain
CityBarcelona
Period12/12/2013/12/20

Fingerprint

Dive into the research topics of 'CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis'. Together they form a unique fingerprint.

Cite this