Skip to main navigation Skip to search Skip to main content

A Comprehensive Study on GDPR-Oriented Analysis of Privacy Policies: Taxonomy, Corpus and GDPR Concept Classifiers

  • Peng Tang
  • , Xin Li
  • , Yuxin Chen
  • , Weidong Qiu*
  • , Haochen Mei
  • , Allison Holmes
  • , Fenghua Li
  • , Shujun Li*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Machine learning (ML) based classifiers that take a privacy policy as the input and predict relevant concepts are useful in different applications such as (semi-)automated compliance analysis against requirements of a specific data protection law such as the EU GDPR. Although many researchers have studied ML-based privacy policy concept classifiers, we observed multiple research gaps, e.g., the lack of a more complete GDPR taxonomy and the less consideration of hierarchical information in privacy policies. To fill such research gaps, we produced a more complete GDPR-oriented privacy policy concept taxonomy, constructed the first privacy policy corpus with explicitly hierarchical information at three levels, and conducted the most comprehensive performance evaluation study of GDPR concept classifiers for privacy policies, cover many aspects that have not been studied systematically. Our work led to multiple findings and insights, including the usefulness of considering hierarchical contextual features and different hierarchical structures, the observation that a “one size fits all” approach may not work, the reduced performance of such classifiers on our newly constructed corpus especially after the first level, and the necessity to split the training and testing sets by documents.
Original languageEnglish
JournalIEEE Transactions on Dependable and Secure Computing
DOIs
Publication statusE-pub ahead of print - 24 Mar 2026

Keywords

  • Privacy
  • Taxonomy
  • General Data Protection Regulation
  • Law
  • Performance evaluation
  • Regulation
  • Training
  • Annotations
  • Vectors
  • Text categorization

Fingerprint

Dive into the research topics of 'A Comprehensive Study on GDPR-Oriented Analysis of Privacy Policies: Taxonomy, Corpus and GDPR Concept Classifiers'. Together they form a unique fingerprint.

Cite this