Relative privacy threats and learning from anonymized data

Boreale, M.; Corradi, F.; Viscardi, C.

doi:10.1109/TIFS.2019.2937640

We consider group-based anonymization schemes, a popular approach to data publishing. This approach aims at protecting privacy of the individuals involved in a dataset, by releasing an obfuscated version of the original data, where the exact correspondence between individuals and attribute values is hidden. When publishing data about individuals, one must typically balance the learner's utility against the risk posed by an attacker, potentially targeting individuals in the dataset. Accordingly, we propose a unified Bayesian model of group-based schemes and a related MCMC methodology to learn the population parameters from an anonymized table. This allows one to analyze the risk for any individual in the dataset to be linked to a specific sensitive value, when the attacker knows the individual's nonsensitive attributes, beyond what is implied for the general population. We call this relative threat analysis. Finally, we illustrate the results obtained with the proposed methodology on a real-world dataset.