UniSa - IRIS Institutional Research Information System

In the evolving landscape of artificial intelligence, ensuring the robustness and explainability of machine learning models is valuable. This study presents an innovative method based on the Rough Set Theory and Principles of Justified Granularity to enhance the explainability of text-based classifiers, specifically in style-based news bias classification. The method helps understand why a classifier can be deceived with an Adversarial Attack. It leverages two levels of insight. The first level is independent of the specific classifier and consists of generating rules from a boundary region built with Rough Sets Theory starting from train data. The second level considers the behavior of a specific machine learning model in classifying manipulated observations and, starting from the classification results, constructs information granules of true positives and false negatives. These granules are representative of observations that deceived a classifier. By comparing boundary rules with information granules, it is possible to acquire actionable knowledge that is useful for making decisions on making a machine learning model more resilient. Results are evaluated with real data containing biased news. The success rate of adversarial examples generated using LLM to test classifiers on borderline cases, where minor textual changes cause false negatives, ranges from 45% to 68%.

Explaining vulnerabilities of biased news classifiers through rough sets and granular computing

Fenza, Giuseppe;Gaeta, Angelo;Loia, Vincenzo;Orciuoli, Francesco;Stanzione, Claudio

2025

Abstract

In the evolving landscape of artificial intelligence, ensuring the robustness and explainability of machine learning models is valuable. This study presents an innovative method based on the Rough Set Theory and Principles of Justified Granularity to enhance the explainability of text-based classifiers, specifically in style-based news bias classification. The method helps understand why a classifier can be deceived with an Adversarial Attack. It leverages two levels of insight. The first level is independent of the specific classifier and consists of generating rules from a boundary region built with Rough Sets Theory starting from train data. The second level considers the behavior of a specific machine learning model in classifying manipulated observations and, starting from the classification results, constructs information granules of true positives and false negatives. These granules are representative of observations that deceived a classifier. By comparing boundary rules with information granules, it is possible to acquire actionable knowledge that is useful for making decisions on making a machine learning model more resilient. Results are evaluated with real data containing biased news. The success rate of adversarial examples generated using LLM to test classifiers on borderline cases, where minor textual changes cause false negatives, ranges from 45% to 68%.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno di pubblicazione

2025

Appare nelle tipologie:

1.1 Articoli su Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4945060

Citazioni

ND

2

ND

social impact