Projects per year
Abstract
Synthetic data has emerged as a promising avenue for privacy-preserving data sharing. However, constructing synthetic data generators necessitates access to the real dataset, posing challenges, particularly when data features are disparately distributed across different organizations. Vertical Federated Learning (VFL) is a collaborative approach to training machine learning models among distinct tabular data holders, such as financial institutions, who possess disjoint features for the same group of customers. In this paper, we introduce the GTV framework for Generating Tabular Data via Vertical Federated Learning and demonstrate that VFL can be successfully used to implement GANs for distributed tabular data in a privacy-preserving manner, with performance close to centralized GANs which assume shared data. We make design choices with respect to the distribution of GAN generator and discriminator models, and we introduce a training-with-shuffling technique so that no party can reconstruct training data from the GAN conditional vector. The paper presents (1) an implementation of GTV, (2) a detailed quality evaluation of GTV-generated synthetic data, (3) an examination of the GTV framework for different data distributions and number of clients, and (4) an analysis of GTV's robustness against Membership Inference Attacks with different settings of Differential Privacy, for a range of datasets with diverse distribution characteristics. Our results demonstrate that GTV can consistently generate high-fidelity synthetic tabular data of comparable quality to that generated by a centralized GAN algorithm. The difference in machine learning utility can be as low as 2.7%, even under extremely imbalanced data distributions across clients. Code is available at: https://github.com/zhao-zilong/gtv
| Original language | English |
|---|---|
| Title of host publication | 2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2025) |
| Editors | Marcello Cinque, Domenico Cotroneo, Luigi De Simone, Matthias Eckhart, Patrick P. C. Lee, Saman Zonouz |
| Publisher | IEEE |
| Pages | 33-46 |
| Number of pages | 14 |
| ISBN (Electronic) | 9798331512019 |
| ISBN (Print) | 9798331512026 |
| DOIs | |
| Publication status | Published - 11 Jul 2025 |
| Event | 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2025 - Naples, Italy Duration: 23 Jun 2025 → 26 Jun 2025 |
Publication series
| Name | Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) |
|---|---|
| Publisher | IEEE |
| ISSN (Print) | 1530-0889 |
| ISSN (Electronic) | 2158-3927 |
Conference
| Conference | 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2025 |
|---|---|
| Country/Territory | Italy |
| City | Naples |
| Period | 23/06/25 → 26/06/25 |
Bibliographical note
Publisher Copyright: © 2025 IEEE.Keywords
- GAN
- Privacy-preserving machine learning
- Tabular data
- Vertical Federated Learning
ASJC Scopus subject areas
- Information Systems
- Computer Networks and Communications
- Hardware and Architecture
- Safety, Risk, Reliability and Quality
Fingerprint
Dive into the research topics of 'GTV: Generating Tabular Data via Vertical Federated Learning'. Together they form a unique fingerprint.Projects
- 1 Finished
-
AGENCY: Assuring Citizen Agency in a World with Complex Online Harms
van Moorsel, A. (Principal Investigator) & Elliott, K. (Co-Investigator)
Engineering & Physical Science Research Council
15/07/22 → 31/03/25
Project: Research Councils