We define and implement a novel side-channel attack that exploits a smartphone’s accelerometer to eavesdrop entire words that the device itself is reproducing through its loudspeakers. The proposed approach consists of two modules: (i) a deep learning-based system that, using a Convolutional Neural Network (CNN), learns to recognize a set of significant speech units, using the spectrogram representation of the corresponding acceleration signals; (ii) an evolutionary-based segmentation method that, given the accelerometer measurements corresponding to an input speech, finds the best way to split it so that the proposed CNN maintains a high classification performance on each of the segments obtained, guarantying the recognition of a significant percentage of words from the original speech. Results of experiments performed to assess the effectiveness of the proposed attack, show its ability to recognize a percentage of words which is higher for short speeches and diminishes as the speeches get longer. We experimented with speeches of lengths ranging from 5 to 60 s, obtaining a recognition percentage going from about 80% for the shortest speeches, down to about 54% for the longest ones.
An improved privacy attack on smartphones exploiting the accelerometer
Roberto De Prisco;Alfredo De Santis;Delfina Malandrino;Rocco Zaccagnino
2023-01-01
Abstract
We define and implement a novel side-channel attack that exploits a smartphone’s accelerometer to eavesdrop entire words that the device itself is reproducing through its loudspeakers. The proposed approach consists of two modules: (i) a deep learning-based system that, using a Convolutional Neural Network (CNN), learns to recognize a set of significant speech units, using the spectrogram representation of the corresponding acceleration signals; (ii) an evolutionary-based segmentation method that, given the accelerometer measurements corresponding to an input speech, finds the best way to split it so that the proposed CNN maintains a high classification performance on each of the segments obtained, guarantying the recognition of a significant percentage of words from the original speech. Results of experiments performed to assess the effectiveness of the proposed attack, show its ability to recognize a percentage of words which is higher for short speeches and diminishes as the speeches get longer. We experimented with speeches of lengths ranging from 5 to 60 s, obtaining a recognition percentage going from about 80% for the shortest speeches, down to about 54% for the longest ones.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.