GTV: Generating Tabular Data via Vertical Federated Learning

  • Zilong Zhao
  • , Han Wu*
  • , Aad van Moorsel
  • , Lydia Y. Chen
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Synthetic data has emerged as a promising avenue for privacy-preserving data sharing. However, constructing synthetic data generators necessitates access to the real dataset, posing challenges, particularly when data features are disparately distributed across different organizations. Vertical Federated Learning (VFL) is a collaborative approach to training machine learning models among distinct tabular data holders, such as financial institutions, who possess disjoint features for the same group of customers. In this paper, we introduce the GTV framework for Generating Tabular Data via Vertical Federated Learning and demonstrate that VFL can be successfully used to implement GANs for distributed tabular data in a privacy-preserving manner, with performance close to centralized GANs which assume shared data. We make design choices with respect to the distribution of GAN generator and discriminator models, and we introduce a training-with-shuffling technique so that no party can reconstruct training data from the GAN conditional vector. The paper presents (1) an implementation of GTV, (2) a detailed quality evaluation of GTV-generated synthetic data, (3) an examination of the GTV framework for different data distributions and number of clients, and (4) an analysis of GTV's robustness against Membership Inference Attacks with different settings of Differential Privacy, for a range of datasets with diverse distribution characteristics. Our results demonstrate that GTV can consistently generate high-fidelity synthetic tabular data of comparable quality to that generated by a centralized GAN algorithm. The difference in machine learning utility can be as low as 2.7%, even under extremely imbalanced data distributions across clients. Code is available at: https://github.com/zhao-zilong/gtv

Original languageEnglish
Title of host publication2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2025)
EditorsMarcello Cinque, Domenico Cotroneo, Luigi De Simone, Matthias Eckhart, Patrick P. C. Lee, Saman Zonouz
PublisherIEEE
Pages33-46
Number of pages14
ISBN (Electronic)9798331512019
ISBN (Print)9798331512026
DOIs
Publication statusPublished - 11 Jul 2025
Event55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2025 - Naples, Italy
Duration: 23 Jun 202526 Jun 2025

Publication series

NameAnnual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
PublisherIEEE
ISSN (Print)1530-0889
ISSN (Electronic)2158-3927

Conference

Conference55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2025
Country/TerritoryItaly
CityNaples
Period23/06/2526/06/25

Bibliographical note

Publisher Copyright: © 2025 IEEE.

Keywords

  • GAN
  • Privacy-preserving machine learning
  • Tabular data
  • Vertical Federated Learning

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Hardware and Architecture
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'GTV: Generating Tabular Data via Vertical Federated Learning'. Together they form a unique fingerprint.

Cite this