UniSa - IRIS Institutional Research Information System

Unpredictable changes in the underlying distribution of the streaming data over time are known as concept drift. The development of procedures and techniques for drift detection, interpretation, and adaptation is central to concept-drift research. Data research has demonstrated that machine learning in a concept-drift environment produces poor learning results if drift is not handled. This study focuses on defining the concept-drift detection index to predict when the performance of a machine learning model for text-stream classifiers is low. It proposes an index that relies on the Fuzzy Formal Concept Analysis theory. The index exploits the formal lattice to understand whether new incoming facts (e.g., news) are well represented in the training data used to build the machine-learning model. Fake news was deemed ideal for testing this new measure because its typical application scenario required handling a stream of unstructured content and concept-drift awareness. Experiments on three news datasets revealed a relevant correlation (i.e., 73.9 %, 80.8 %, and 81 %) between the Accuracy of Random Forest (RF), Naive Bayes (NB), and Passive Aggressive (PA) models, respectively, and the proposed index. This strong correlation suggests that the new index can avoid incorrect classifications and help in retraining decisions.

Concept-drift detection index based on fuzzy formal concept analysis for fake news classifiers

Fenza, G;Gallo, M;Loia, V;Petrone, A;Stanzione, C

2023

Abstract

Unpredictable changes in the underlying distribution of the streaming data over time are known as concept drift. The development of procedures and techniques for drift detection, interpretation, and adaptation is central to concept-drift research. Data research has demonstrated that machine learning in a concept-drift environment produces poor learning results if drift is not handled. This study focuses on defining the concept-drift detection index to predict when the performance of a machine learning model for text-stream classifiers is low. It proposes an index that relies on the Fuzzy Formal Concept Analysis theory. The index exploits the formal lattice to understand whether new incoming facts (e.g., news) are well represented in the training data used to build the machine-learning model. Fake news was deemed ideal for testing this new measure because its typical application scenario required handling a stream of unstructured content and concept-drift awareness. Experiments on three news datasets revealed a relevant correlation (i.e., 73.9 %, 80.8 %, and 81 %) between the Accuracy of Random Forest (RF), Naive Bayes (NB), and Passive Aggressive (PA) models, respectively, and the proposed index. This strong correlation suggests that the new index can avoid incorrect classifications and help in retraining decisions.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno di pubblicazione

2023

Appare nelle tipologie:

1.1 Articoli su Rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0040162523003256-main (1).pdf non disponibili Licenza: Copyright dell'editore Dimensione 2.17 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.17 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4842655

Citazioni

ND

26

18

social impact