TY - JOUR
T1 - Hiding outliers into crowd
T2 - Privacy-preserving data publishing with outliers
AU - Wang, H.
AU - Liu, Ruilin
N1 - Publisher Copyright:
© 2015 Elsevier B.V. All rights reserved.
PY - 2015/11
Y1 - 2015/11
N2 - In recent years, many organizations publish their data in non-aggregated format for research purpose. However, publishing non-aggregated data raises serious concerns in data privacy. One of the concerns is that when outliers exist in the dataset, they are easier to be distinguished from the crowd and their privacy is prone to be compromised. In this paper, we study the problem of privacy-preserving publishing datasets that contain outliers. We define the distinguishability-based attack by which the adversary can identify outliers and reveal their private information from an anonymized dataset. We show that the existing syntactic privacy models (e.g., k-anonymity and ℓ-diversity) cannot defend against the distinguishability-based attack. We define the plain ℓ-diversity to provide privacy guarantee to outliers against the distinguishability-based attack, and design efficient algorithms to anonymize the dataset to achieve plain ℓ-diversity with low information loss. We extend our anonymization approach to deal with continuous release of a series of datasets that contain outliers. Our experiments demonstrate the efficiency and effectiveness of our approaches.
AB - In recent years, many organizations publish their data in non-aggregated format for research purpose. However, publishing non-aggregated data raises serious concerns in data privacy. One of the concerns is that when outliers exist in the dataset, they are easier to be distinguished from the crowd and their privacy is prone to be compromised. In this paper, we study the problem of privacy-preserving publishing datasets that contain outliers. We define the distinguishability-based attack by which the adversary can identify outliers and reveal their private information from an anonymized dataset. We show that the existing syntactic privacy models (e.g., k-anonymity and ℓ-diversity) cannot defend against the distinguishability-based attack. We define the plain ℓ-diversity to provide privacy guarantee to outliers against the distinguishability-based attack, and design efficient algorithms to anonymize the dataset to achieve plain ℓ-diversity with low information loss. We extend our anonymization approach to deal with continuous release of a series of datasets that contain outliers. Our experiments demonstrate the efficiency and effectiveness of our approaches.
KW - Data anonymization
KW - Data sharing
KW - Integrity and protection
KW - Outliers
KW - Security
UR - http://www.scopus.com/inward/record.url?scp=84946485007&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84946485007&partnerID=8YFLogxK
U2 - 10.1016/j.datak.2015.06.012
DO - 10.1016/j.datak.2015.06.012
M3 - Article
AN - SCOPUS:84946485007
SN - 0169-023X
VL - 100
SP - 94
EP - 115
JO - Data and Knowledge Engineering
JF - Data and Knowledge Engineering
ER -