A common practice for websites is to rely on services provided by third party sites to track users and provide personalized experiences. Unfortunately, this practice has strong implications for both users and performance. From one hand, the privacy of individuals is at a risk given the use of valuable information used for the reconstruction of personal profiles. From the other hand, many existing countermeasures to protect privacy, having been implemented into Web browsers, exhibit performance issues, mainly due to the use of huge (and difficult to maintain up to date) lists of resources that have to be filtered out, given their privacy intrusiveness. To overcome these limitations, we propose the use of a hybrid mechanism exploiting blacklisting and machine learning for the automatic identification of privacy intrusive services requested while browsing Web pages. The idea is to use the blacklisting technique (widely used by the majority of privacy tools), in combination with a machine learning model which distinguishes between malicious and functional resources, and hence updates the blacklist, accordingly. We found out that machine learning models are able to classify JavaScript programs and HTTP requests with accuracy up to 91% and 97%, respectively. We provided a prototype implementation of this hybrid mechanism, named GuardOne, and we performed an exhaustive evaluation study to assess its effectiveness and performance. Results showed that GuardOne is able to filter out malicious resources from users' requests without performance degradation when compared with traditional systems that leverage on the use of static lists for filtering. Moreover, results about effectiveness show that our mechanism, even with some small improvements, is able to efficiently filter out malicious requests and reduce in a substantial way personal information leakage.

Hybrid and lightweight detection of third party tracking: Design, implementation, and evaluation

Federico Cozza;Alfonso Guarino;Delfina Malandrino;Antonio Rapuano;Raffaele Schiavone;Rocco Zaccagnino
2020-01-01

Abstract

A common practice for websites is to rely on services provided by third party sites to track users and provide personalized experiences. Unfortunately, this practice has strong implications for both users and performance. From one hand, the privacy of individuals is at a risk given the use of valuable information used for the reconstruction of personal profiles. From the other hand, many existing countermeasures to protect privacy, having been implemented into Web browsers, exhibit performance issues, mainly due to the use of huge (and difficult to maintain up to date) lists of resources that have to be filtered out, given their privacy intrusiveness. To overcome these limitations, we propose the use of a hybrid mechanism exploiting blacklisting and machine learning for the automatic identification of privacy intrusive services requested while browsing Web pages. The idea is to use the blacklisting technique (widely used by the majority of privacy tools), in combination with a machine learning model which distinguishes between malicious and functional resources, and hence updates the blacklist, accordingly. We found out that machine learning models are able to classify JavaScript programs and HTTP requests with accuracy up to 91% and 97%, respectively. We provided a prototype implementation of this hybrid mechanism, named GuardOne, and we performed an exhaustive evaluation study to assess its effectiveness and performance. Results showed that GuardOne is able to filter out malicious resources from users' requests without performance degradation when compared with traditional systems that leverage on the use of static lists for filtering. Moreover, results about effectiveness show that our mechanism, even with some small improvements, is able to efficiently filter out malicious requests and reduce in a substantial way personal information leakage.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4731033
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 21
social impact