Cross-lingual Classification of Crisis-related Tweets Using Machine Translation

Shareefa Al Amer, Mark Lee, Phillip Smith

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Utilisation of multilingual language models such as mBERT and XLM-RoBERTa has increasingly gained attention in recent work by exploiting the multilingualism of such models in different downstream tasks across different languages. However, performance degradation is expected in transfer learning across languages compared to monolingual performance although it is an acceptable trade-off considering the sparsity of resources and lack of available training data in low-resource languages. In this work, we study the effect of machine translation on the cross-lingual transfer learning in a crisis event classification task. Our experiments include measuring the effect of machine-translating the test data into the source language and vice versa. We evaluated and compared the performance in terms of accuracy and F1-Score. The results show that translating the source data into the target language improves the prediction accuracy by 14.8% and the Weighted Average F1-Score by 19.2% when compared to zero-shot transfer to an unseen language.

Original languageEnglish
Title of host publicationProceedings of the 14th International Conference on Recent Advances in Natural Language Processing
EditorsRuslan Mitkov, Galia Angelova
PublisherIncoma Ltd
Pages22-31
Number of pages10
ISBN (Electronic)9789544520922
DOIs
Publication statusPublished - 6 Sept 2023
Event2023 International Conference Recent Advances in Natural Language Processing: Large Language Models for Natural Language Processing, RANLP 2023 - Varna, Bulgaria
Duration: 4 Sept 20236 Sept 2023

Publication series

NameInternational Conference Recent Advances in Natural Language Processing
PublisherIncoma Ltd
ISSN (Print)1313-8502
ISSN (Electronic)2603-2813

Conference

Conference2023 International Conference Recent Advances in Natural Language Processing: Large Language Models for Natural Language Processing, RANLP 2023
Country/TerritoryBulgaria
CityVarna
Period4/09/236/09/23

Bibliographical note

Publisher Copyright:
© 2023 Incoma Ltd. All rights reserved.

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Cross-lingual Classification of Crisis-related Tweets Using Machine Translation'. Together they form a unique fingerprint.

Cite this