Given the increasing popularity of social media and other Internet-related technologies, individuals spend a lot of time across different online activities, such as doing Google searches or credit card purchases, enjoying social networks interactions, performing job finding or travel plannings activities. Unfortunately, very often, individuals unwittingly disseminate a huge amount of personal and sensitive information that fundamentally represents an essential part of their private life. A large part of this information is embedded into text messages typed during online activities. Therefore, there is an increasing need for mechanisms to assist individuals during such activities, raising their awareness about potential violation of privacy at the time of disclosure; however, it is also essential to give them full control on whether and how to manage their data, thereby empowering them to make heedful decisions. The awareness can be realized through simple alert/highlight mechanisms, while the full control can be ensured by allowing users to make the final choice, that is, ignore warnings, or conversely accept them and thus (a) think twice before disseminating data (to avoid future regrets), or (b) anyway send data, but only after their anonymization. In this paper, we propose a novel approach based on machine learning and sentence embedding techniques with the primary goal of providing privacy awareness to users and, as a consequence, full control over their data during online activities. Our approach relies on the definition of four modules: (i) the Keyword module, which identifies personal and sensitive data in a text (from the syntactic point of view); (ii) the Topic module, which is devoted to understand the topic treated in text messages; (iii) the Sensitiveness module, which identifies sensitive information (from the semantic point of view) into text messages; lastly, (iv) the Personalization module, which goal is to learn the personal attitude of a user towards his/her own privacy (through opportune feedback) and therefore report the correct alert messages. We provided an implementation of such an approach, named Knoxly, as a prototype of a Google Chrome extension. The tool has undergone a preliminary experimental study to assess its effectiveness in terms of sensitive information identification accuracy, and its efficiency in terms of impact on user experience.
An automatic mechanism to provide privacy awareness and control over unwittingly dissemination of online private information
Guarino A.;Malandrino D.;Zaccagnino R.
2022-01-01
Abstract
Given the increasing popularity of social media and other Internet-related technologies, individuals spend a lot of time across different online activities, such as doing Google searches or credit card purchases, enjoying social networks interactions, performing job finding or travel plannings activities. Unfortunately, very often, individuals unwittingly disseminate a huge amount of personal and sensitive information that fundamentally represents an essential part of their private life. A large part of this information is embedded into text messages typed during online activities. Therefore, there is an increasing need for mechanisms to assist individuals during such activities, raising their awareness about potential violation of privacy at the time of disclosure; however, it is also essential to give them full control on whether and how to manage their data, thereby empowering them to make heedful decisions. The awareness can be realized through simple alert/highlight mechanisms, while the full control can be ensured by allowing users to make the final choice, that is, ignore warnings, or conversely accept them and thus (a) think twice before disseminating data (to avoid future regrets), or (b) anyway send data, but only after their anonymization. In this paper, we propose a novel approach based on machine learning and sentence embedding techniques with the primary goal of providing privacy awareness to users and, as a consequence, full control over their data during online activities. Our approach relies on the definition of four modules: (i) the Keyword module, which identifies personal and sensitive data in a text (from the syntactic point of view); (ii) the Topic module, which is devoted to understand the topic treated in text messages; (iii) the Sensitiveness module, which identifies sensitive information (from the semantic point of view) into text messages; lastly, (iv) the Personalization module, which goal is to learn the personal attitude of a user towards his/her own privacy (through opportune feedback) and therefore report the correct alert messages. We provided an implementation of such an approach, named Knoxly, as a prototype of a Google Chrome extension. The tool has undergone a preliminary experimental study to assess its effectiveness in terms of sensitive information identification accuracy, and its efficiency in terms of impact on user experience.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.