Machine learning techniques applied to large and distributed data archives might result in the disclosure of sensitive information. Data often contain sensitive identifiable information, and even if these are protected, the excessive processing capabilities of current machine learning techniques might facilitate the identification of individuals. This discussion paper presents a decision-support framework for data anonymization. The latter relies on a novel approach that exploits data correlations, expressed in terms of relaxed functional dependencies (rfds), to identify data anonymization strategies for providing suitable trade-offs between privacy and data utility. It also permits to generate anonymization strategies leveraging multiple data correlations simultaneously to increase the utility of anonymized datasets. In addition, our framework provides support in the selection of the anonymization strategies by enabling an understanding of the trade-offs between privacy and data utility offered by the obtained strategies. Experiments on real-life datasets show that our approach achieves promising results in data utility while guaranteeing the desired privacy level. Additionally, it allows data owners to select anonymization strategies balancing their privacy and data utility requirements.
An Approach to Trade-off Privacy and Classification Accuracy in Machine Learning Processes
Caruccio L.;Desiato D.;Polese G.;Tortora G.;
2023-01-01
Abstract
Machine learning techniques applied to large and distributed data archives might result in the disclosure of sensitive information. Data often contain sensitive identifiable information, and even if these are protected, the excessive processing capabilities of current machine learning techniques might facilitate the identification of individuals. This discussion paper presents a decision-support framework for data anonymization. The latter relies on a novel approach that exploits data correlations, expressed in terms of relaxed functional dependencies (rfds), to identify data anonymization strategies for providing suitable trade-offs between privacy and data utility. It also permits to generate anonymization strategies leveraging multiple data correlations simultaneously to increase the utility of anonymized datasets. In addition, our framework provides support in the selection of the anonymization strategies by enabling an understanding of the trade-offs between privacy and data utility offered by the obtained strategies. Experiments on real-life datasets show that our approach achieves promising results in data utility while guaranteeing the desired privacy level. Additionally, it allows data owners to select anonymization strategies balancing their privacy and data utility requirements.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.