Due to their open nature and popularity, Android-based devices have attracted several end-users around the World and are one of the main targets for attackers. Because of the reasons given above, it is necessary to build tools that can reliably detect zero-day malware on these devices. At the moment, many of the frameworks that have been proposed to detect malware applications leverage Machine Learning (ML) techniques. However, an essential requirement to build these frameworks consists of using very large and sophisticated datasets for model construction and training purposes. Their success, indeed, strongly depends on the choice of the right features used for building a classification model providing adequate generalisation capability. Furthermore, the creation of a training dataset that well represents the malware properties and behaviour is one of the most critical challenges in malware analysis. Therefore, the main aim of this paper is proposing a new dataset called Unisa Malware Dataset (UMD) available on http://antlab.di.unisa.it/malware/, which is based on the extraction of static and dynamic features characterising the malware activities. Additionally, we will show some experiments concerning common ML tools to demonstrate how it is possible to build efficient ML-based malware classification frameworks using the proposed dataset.

Effective classification of android malware families through dynamic features and neural networks

D'Angelo G.;Palmieri F.;Robustelli A.;Castiglione A.
2021-01-01

Abstract

Due to their open nature and popularity, Android-based devices have attracted several end-users around the World and are one of the main targets for attackers. Because of the reasons given above, it is necessary to build tools that can reliably detect zero-day malware on these devices. At the moment, many of the frameworks that have been proposed to detect malware applications leverage Machine Learning (ML) techniques. However, an essential requirement to build these frameworks consists of using very large and sophisticated datasets for model construction and training purposes. Their success, indeed, strongly depends on the choice of the right features used for building a classification model providing adequate generalisation capability. Furthermore, the creation of a training dataset that well represents the malware properties and behaviour is one of the most critical challenges in malware analysis. Therefore, the main aim of this paper is proposing a new dataset called Unisa Malware Dataset (UMD) available on http://antlab.di.unisa.it/malware/, which is based on the extraction of static and dynamic features characterising the malware activities. Additionally, we will show some experiments concerning common ML tools to demonstrate how it is possible to build efficient ML-based malware classification frameworks using the proposed dataset.
2021
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4763327
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 31
  • ???jsp.display-item.citation.isi??? 24
social impact