Hiding outliers into crowd: Privacy-preserving data publishing with outliers

H. Wang, Ruilin Liu

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

In recent years, many organizations publish their data in non-aggregated format for research purpose. However, publishing non-aggregated data raises serious concerns in data privacy. One of the concerns is that when outliers exist in the dataset, they are easier to be distinguished from the crowd and their privacy is prone to be compromised. In this paper, we study the problem of privacy-preserving publishing datasets that contain outliers. We define the distinguishability-based attack by which the adversary can identify outliers and reveal their private information from an anonymized dataset. We show that the existing syntactic privacy models (e.g., k-anonymity and ℓ-diversity) cannot defend against the distinguishability-based attack. We define the plain ℓ-diversity to provide privacy guarantee to outliers against the distinguishability-based attack, and design efficient algorithms to anonymize the dataset to achieve plain ℓ-diversity with low information loss. We extend our anonymization approach to deal with continuous release of a series of datasets that contain outliers. Our experiments demonstrate the efficiency and effectiveness of our approaches.

Original languageEnglish
Pages (from-to)94-115
Number of pages22
JournalData and Knowledge Engineering
Volume100
DOIs
StatePublished - Nov 2015

Keywords

  • Data anonymization
  • Data sharing
  • Integrity and protection
  • Outliers
  • Security

Fingerprint

Dive into the research topics of 'Hiding outliers into crowd: Privacy-preserving data publishing with outliers'. Together they form a unique fingerprint.

Cite this