Malware is a serious threat in a world where IoT devices are becoming more and more pervasive; indeed, every day new and more sophisticated malware can rely on an attack surface that grows together with the number of new devices coming to the market. There is a constant competition between malware detection systems that have to adapt their knowledge base and heuristics day by day and malware writers that have to find new techniques to evade these systems. In this scenario, machine learning methods are the best candidate to face the continuous evolution of malware; this justifies the increasing interest in such approaches to build antimalware systems able to learn and adapt themselves. However, a still open question is how robust machine learning-based systems are against obfuscation techniques: methods that base their effectiveness on what they are able to learn from a training set are potentially vulnerable to modifications of the code that alter the probabilistic distribution of the features observed during the training phase. In this paper we propose a comparison of seven different methods trained to classify malware, paying specific attention to the recent image-based approaches. The comparison has been conducted using one of the largest dataset of malware publicly released until now, i.e., the SOREL-20M, composed of more than 20 million of samples divided in 11 families of malware. In the proposed analysis, we have considered four basic obfuscation techniques based on the addition of a sequence of bytes at the end of the executable; they are very easy to implement for a malware writer. All the tested methods achieved a very high accuracy on unmodified test samples, but only few of them have demonstrated to be able to withstand the considered obfuscation techniques.

Machine Learning Methodologies for Preventing Malware Obfuscation

Carletti V.;Saggese A.;Foggia P.;Greco A.;Vento M.
2023-01-01

Abstract

Malware is a serious threat in a world where IoT devices are becoming more and more pervasive; indeed, every day new and more sophisticated malware can rely on an attack surface that grows together with the number of new devices coming to the market. There is a constant competition between malware detection systems that have to adapt their knowledge base and heuristics day by day and malware writers that have to find new techniques to evade these systems. In this scenario, machine learning methods are the best candidate to face the continuous evolution of malware; this justifies the increasing interest in such approaches to build antimalware systems able to learn and adapt themselves. However, a still open question is how robust machine learning-based systems are against obfuscation techniques: methods that base their effectiveness on what they are able to learn from a training set are potentially vulnerable to modifications of the code that alter the probabilistic distribution of the features observed during the training phase. In this paper we propose a comparison of seven different methods trained to classify malware, paying specific attention to the recent image-based approaches. The comparison has been conducted using one of the largest dataset of malware publicly released until now, i.e., the SOREL-20M, composed of more than 20 million of samples divided in 11 families of malware. In the proposed analysis, we have considered four basic obfuscation techniques based on the addition of a sequence of bytes at the end of the executable; they are very easy to implement for a malware writer. All the tested methods achieved a very high accuracy on unmodified test samples, but only few of them have demonstrated to be able to withstand the considered obfuscation techniques.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4825392
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact