Mel Spectrogram-Based CNN Framework for Explainable Audio Deepfake Detection

Bajwa, M. Z.; Castiglione, A.; Pero, C.

doi:10.1007/978-3-031-87784-1_37

The rise of audio deepfakes is becoming a growing concern for media credibility, particularly on social platforms. This study explores an approach to detecting audio deepfakes using Convolutional Neural Networks (CNNs) applied to Mel spectrograms, which serve as visual representations of audio signals. Six CNN architectures (VGG16, VGG19, ResNet50, DenseNet121, MobileNetV2, and EfficientNetB0) were evaluated using the FakeAVCelebV2 dataset, considering metrics such as precision, recall, F1-score, and accuracy. To provide better insight into model decisions, Grad-CAM, an Explainable Artificial Intelligence (XAI) technique, was employed to highlight the most relevant regions of the spectrogram for distinguishing between real and fake audio. The study also tested the model’s performance under conditions with added Gaussian and white noise to assess its robustness. The results confirm that CNN-based Mel spectrogram analysis is an effective method for audio deepfake detection, and they underline the importance of interpretability to ensure trustworthy media detection systems.

Mel Spectrogram-Based CNN Framework for Explainable Audio Deepfake Detection

Bajwa M. Z.;Castiglione A.;Pero C.

2025

Abstract

The rise of audio deepfakes is becoming a growing concern for media credibility, particularly on social platforms. This study explores an approach to detecting audio deepfakes using Convolutional Neural Networks (CNNs) applied to Mel spectrograms, which serve as visual representations of audio signals. Six CNN architectures (VGG16, VGG19, ResNet50, DenseNet121, MobileNetV2, and EfficientNetB0) were evaluated using the FakeAVCelebV2 dataset, considering metrics such as precision, recall, F1-score, and accuracy. To provide better insight into model decisions, Grad-CAM, an Explainable Artificial Intelligence (XAI) technique, was employed to highlight the most relevant regions of the spectrogram for distinguishing between real and fake audio. The study also tested the model’s performance under conditions with added Gaussian and white noise to assess its robustness. The results confirm that CNN-based Mel spectrogram analysis is an effective method for audio deepfake detection, and they underline the importance of interpretability to ensure trustworthy media detection systems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	ISBN
	
				9783031877834
9783031877841
			
	Appare nelle tipologie:
	
				4.1 Contributi in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4910236

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

5

ND

UniSa - IRIS Institutional Research Information System

Mel Spectrogram-Based CNN Framework for Explainable Audio Deepfake Detection

Bajwa M. Z.;Castiglione A.;Pero C.

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Attenzione

Citazioni

social impact

UniSa - IRIS Institutional Research Information System

Mel Spectrogram-Based CNN Framework for Explainable Audio Deepfake Detection

Bajwa M. Z.;Castiglione A.;Pero C.

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)