Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts

Lance Calvin L. Gamboa, Maria Regina Justina E. Estuar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The steady increase in computational gender bias research has been mostly done on languages for which reliable NLP packages are readily available—such as English, Chinese, and Spanish. This study expands on this area of research by using word embedding bias analysis methods in the Philippine context. To this end, Philippine media textual corpora consisting of 380 million English words and 921 million Filipino words were compiled and used to train FastText embeddings. These embeddings were then subjected to validation and to the Word Embedding Association Test (WEAT) to characterize bias in the embeddings and in the texts they were trained in. Results show that Filipino texts are associated with the heterosexual male by default, but strongest biases relate to the female and the non-heterosexual. Meanwhile, media texts written in English generally have more balanced gender associations compared to texts written in Filipino. Furthermore, the Filipino corpus links action more to the male and objects and social roles to the female. On the other hand, implicitly gendered words in English texts are mostly nouns. These results contribute to demonstrations of how WEAT can be applied in low-resource languages, such as Filipino.
Original languageEnglish
Title of host publication2023 IEEE World Conference on Applied Intelligence and Computing (AIC)
PublisherIEEE
Number of pages6
ISBN (Electronic)9798350310061
ISBN (Print)9798350310078 (PoD)
DOIs
Publication statusPublished - 2 Oct 2023
Event2023 IEEE World Conference on Applied Intelligence and Computing - Sonbhadra, India
Duration: 29 Jul 202330 Jul 2023

Publication series

NameApplied Intelligence and Computing (AIC), IEEE World Conference on

Conference

Conference2023 IEEE World Conference on Applied Intelligence and Computing
Abbreviated titleAIC 2023
Country/TerritoryIndia
CitySonbhadra
Period29/07/2330/07/23

Keywords

  • gender bias
  • sexism
  • Philippines
  • word embeddings
  • natural language processing
  • language models

Fingerprint

Dive into the research topics of 'Characterizing Bias in Word Embeddings Towards Analyzing Gender Associations in Philippine Texts'. Together they form a unique fingerprint.

Cite this