Fair and Private Data Preprocessing through Microaggregation

Vladimiro González-Zelaya; Julián Salas; David Megías; Paolo Missier

doi:10.1145/3617377

Fair and Private Data Preprocessing through Microaggregation

Vladimiro González-Zelaya, Julián Salas, David Megías, Paolo Missier

Computer Science

Research output: Contribution to journal › Article › peer-review

17 Downloads (Pure)

Abstract

Privacy protection for personal data and fairness in automated decisions are fundamental requirements for responsible Machine Learning. Both may be enforced through data preprocessing and share a common target: data should remain useful for a task, while becoming uninformative of the sensitive information. The intrinsic connection between privacy and fairness implies that modifications performed to guarantee one of these goals, may have an effect on the other, e.g., hiding a sensitive attribute from a classification algorithm might prevent a biased decision rule having such attribute as a criterion. This work resides at the intersection of algorithmic fairness and privacy. We show how the two goals are compatible, and may be simultaneously achieved, with a small loss in predictive performance. Our results are competitive with both state-of-the-art fairness correcting algorithms and hybrid privacy-fairness methods. Experiments were performed on three widely used benchmark datasets: Adult Income, COMPAS, and German Credit.

Original language	English
Article number	49
Number of pages	24
Journal	ACM Transactions on Knowledge Discovery from Data
Volume	18
Issue number	3
Early online date	24 Aug 2023
DOIs	https://doi.org/10.1145/3617377
Publication status	Published - Apr 2024

Keywords

Responsible machine learning
fair classification
privacy preserving data mining
algorithmic fairness
ethical AI

Access to Document

10.1145/3617377Licence: Creative Commons: Attribution (CC BY)

GonzálezZelayaV2023FairFinal published version, 1.45 MBLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{6c07641ce2b2488e897bba51ebcd0082,

title = "Fair and Private Data Preprocessing through Microaggregation",

abstract = "Privacy protection for personal data and fairness in automated decisions are fundamental requirements for responsible Machine Learning. Both may be enforced through data preprocessing and share a common target: data should remain useful for a task, while becoming uninformative of the sensitive information. The intrinsic connection between privacy and fairness implies that modifications performed to guarantee one of these goals, may have an effect on the other, e.g., hiding a sensitive attribute from a classification algorithm might prevent a biased decision rule having such attribute as a criterion. This work resides at the intersection of algorithmic fairness and privacy. We show how the two goals are compatible, and may be simultaneously achieved, with a small loss in predictive performance. Our results are competitive with both state-of-the-art fairness correcting algorithms and hybrid privacy-fairness methods. Experiments were performed on three widely used benchmark datasets: Adult Income, COMPAS, and German Credit.",

keywords = "Responsible machine learning, fair classification, privacy preserving data mining, algorithmic fairness, ethical AI",

author = "Vladimiro Gonz{\'a}lez-Zelaya and Juli{\'a}n Salas and David Meg{\'i}as and Paolo Missier",

year = "2024",

month = apr,

doi = "10.1145/3617377",

language = "English",

volume = "18",

journal = "ACM Transactions on Knowledge Discovery from Data",

issn = "1556-4681",

publisher = "Association for Computing Machinery (ACM)",

number = "3",

}

TY - JOUR

T1 - Fair and Private Data Preprocessing through Microaggregation

AU - González-Zelaya, Vladimiro

AU - Salas, Julián

AU - Megías, David

AU - Missier, Paolo

PY - 2024/4

Y1 - 2024/4

N2 - Privacy protection for personal data and fairness in automated decisions are fundamental requirements for responsible Machine Learning. Both may be enforced through data preprocessing and share a common target: data should remain useful for a task, while becoming uninformative of the sensitive information. The intrinsic connection between privacy and fairness implies that modifications performed to guarantee one of these goals, may have an effect on the other, e.g., hiding a sensitive attribute from a classification algorithm might prevent a biased decision rule having such attribute as a criterion. This work resides at the intersection of algorithmic fairness and privacy. We show how the two goals are compatible, and may be simultaneously achieved, with a small loss in predictive performance. Our results are competitive with both state-of-the-art fairness correcting algorithms and hybrid privacy-fairness methods. Experiments were performed on three widely used benchmark datasets: Adult Income, COMPAS, and German Credit.

AB - Privacy protection for personal data and fairness in automated decisions are fundamental requirements for responsible Machine Learning. Both may be enforced through data preprocessing and share a common target: data should remain useful for a task, while becoming uninformative of the sensitive information. The intrinsic connection between privacy and fairness implies that modifications performed to guarantee one of these goals, may have an effect on the other, e.g., hiding a sensitive attribute from a classification algorithm might prevent a biased decision rule having such attribute as a criterion. This work resides at the intersection of algorithmic fairness and privacy. We show how the two goals are compatible, and may be simultaneously achieved, with a small loss in predictive performance. Our results are competitive with both state-of-the-art fairness correcting algorithms and hybrid privacy-fairness methods. Experiments were performed on three widely used benchmark datasets: Adult Income, COMPAS, and German Credit.

KW - Responsible machine learning

KW - fair classification

KW - privacy preserving data mining

KW - algorithmic fairness

KW - ethical AI

U2 - 10.1145/3617377

DO - 10.1145/3617377

M3 - Article

SN - 1556-4681

VL - 18

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

IS - 3

M1 - 49

ER -

Fair and Private Data Preprocessing through Microaggregation

Abstract

Keywords

Access to Document

Fingerprint

Cite this