Skip to main navigation Skip to search Skip to main content

FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation

  • Tong Xia*
  • , Abhirup Ghosh
  • , Xinchi Qiu
  • , Cecilia Mascolo
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Downloads (Pure)

Abstract

Federated Learning (FL) enables model development by leveraging data distributed across numerous edge devices without transferring local data to a central server. However, existing FL methods still face challenges when dealing with scarce and label-skewed data across devices, resulting in local model overfitting and drift, consequently hindering the performance of the global model. In response to these challenges, we propose a pioneering framework called FLea, incorporating the following key components: i) A global feature buffer that stores activation-target pairs shared from multiple clients to support local training. This design mitigates local model drift caused by the absence of certain classes; ii) A feature augmentation approach based on local and global activation mix-ups for local training. This strategy enlarges the training samples, thereby reducing the risk of local overfitting; iii) An obfuscation method to minimize the correlation between intermediate activations and the source data, enhancing the privacy of shared features. To verify the superiority of FLea, we conduct extensive experiments using a wide range of data modalities, simulating different levels of local data scarcity and label skew. The results demonstrate that FLea consistently outperforms state-of-the-art FL counterparts (among 13 of the experimented 18 settings, the improvement is over 5%) while concurrently mitigating the privacy vulnerabilities associated with shared features. Code is available at https://github.com/XTxiatong/FLea.git.
Original languageEnglish
Title of host publicationKDD '24
Subtitle of host publicationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery (ACM)
Pages3484-3494
Number of pages11
ISBN (Print)9798400704901
DOIs
Publication statusPublished - 24 Aug 2024
Event30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining - Barcelona, Spain
Duration: 25 Aug 202429 Aug 2024

Publication series

NameProceedings of the International Conference on Knowledge Discovery and Data Mining
PublisherACM
ISSN (Print)2154-817X

Conference

Conference30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Abbreviated titleKDD '24
Country/TerritorySpain
CityBarcelona
Period25/08/2429/08/24

Keywords

  • Federated learning
  • data scarcity
  • label skew
  • data privacy

Fingerprint

Dive into the research topics of 'FLea: Addressing Data Scarcity and Label Skew in Federated Learning via Privacy-preserving Feature Augmentation'. Together they form a unique fingerprint.

Cite this