Abstract
Differentially private decision tree algorithms have been pop- ular since the introduction of differential privacy. While many private tree-based algorithms have been proposed for supervised learning tasks, such as classification, very few extend naturally to the semi-supervised setting. In this paper, we present a framework that takes advantage of unlabelled data to reduce the noise requirement in differentially private decision forests and improves their predictive performance. The main ingredients in our approach consist of a median splitting criterion that creates balanced leaves, a geometric privacy budget allocation technique, and a random sampling technique to compute the private splitting-point accurately. While similar ideas existed in isolation, their combination is new, and has several advantages: (1) The semi-supervised mode of op- eration comes for free. (2) Our framework is applicable in two different privacy settings: when label-privacy is required, and when privacy of the features is also required. (3) Empirical evidence on 18 UCI data sets and 3 synthetic data sets demonstrate that our algorithm achieves high utility performance compared to the current state of the art in both supervised and semi-supervised classification problems.
Original language | English |
---|---|
Title of host publication | Machine Learning and Knowledge Discovery in Databases |
Subtitle of host publication | European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part IV |
Editors | Massih-Reza Amin, Stéphane Canu, Asja Fischer, Tias Guns, Petra Kralj Novak, Grigorios Tsoumakas |
Place of Publication | Cham |
Publisher | Springer |
Pages | 587–603 |
Number of pages | 17 |
Edition | 1 |
ISBN (Electronic) | 9783031264122 |
ISBN (Print) | 9783031264115 |
DOIs | |
Publication status | Published - 17 Mar 2023 |
Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Genoble, France Duration: 19 Sept 2022 → 23 Sept 2022 https://2022.ecmlpkdd.org/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 13716 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases |
---|---|
Abbreviated title | ECML PKDD 2022 |
Country/Territory | France |
City | Genoble |
Period | 19/09/22 → 23/09/22 |
Internet address |
Keywords
- Differential privacy
- Noise reduction
- Ensembles