In the evolving landscape of artificial intelligence, ensuring the robustness and explainability of machine learning models is valuable. This study presents an innovative method based on the Rough Set Theory and Principles of Justified Granularity to enhance the explainability of text-based classifiers, specifically in style-based news bias classification. The method helps understand why a classifier can be deceived with an Adversarial Attack. It leverages two levels of insight. The first level is independent of the specific classifier and consists of generating rules from a boundary region built with Rough Sets Theory starting from train data. The second level considers the behavior of a specific machine learning model in classifying manipulated observations and, starting from the classification results, constructs information granules of true positives and false negatives. These granules are representative of observations that deceived a classifier. By comparing boundary rules with information granules, it is possible to acquire actionable knowledge that is useful for making decisions on making a machine learning model more resilient. Results are evaluated with real data containing biased news. The success rate of adversarial examples generated using LLM to test classifiers on borderline cases, where minor textual changes cause false negatives, ranges from 45% to 68%.
Explaining vulnerabilities of biased news classifiers through rough sets and granular computing
Fenza, Giuseppe
;Gaeta, Angelo;Loia, Vincenzo;Orciuoli, Francesco;Stanzione, Claudio
2025
Abstract
In the evolving landscape of artificial intelligence, ensuring the robustness and explainability of machine learning models is valuable. This study presents an innovative method based on the Rough Set Theory and Principles of Justified Granularity to enhance the explainability of text-based classifiers, specifically in style-based news bias classification. The method helps understand why a classifier can be deceived with an Adversarial Attack. It leverages two levels of insight. The first level is independent of the specific classifier and consists of generating rules from a boundary region built with Rough Sets Theory starting from train data. The second level considers the behavior of a specific machine learning model in classifying manipulated observations and, starting from the classification results, constructs information granules of true positives and false negatives. These granules are representative of observations that deceived a classifier. By comparing boundary rules with information granules, it is possible to acquire actionable knowledge that is useful for making decisions on making a machine learning model more resilient. Results are evaluated with real data containing biased news. The success rate of adversarial examples generated using LLM to test classifiers on borderline cases, where minor textual changes cause false negatives, ranges from 45% to 68%.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


