UniSa - IRIS Institutional Research Information System

Binary classification deals with identifying whether elements belong to one of two possible categories. Various metrics exist to evaluate the performance of such classification systems. It is important to study and contrast these metrics to find the best one for assessing a particular system. Despite extensive research in this field, a particular systematic comparison of these evaluation metrics remains an unaddressed area. The performance of a classifier is usually evaluated through the confusion matrix, a table including the count of accurate and inaccurate predictions for each category. To judge if one classifier is better than another, examining variations in the confusion matrix is necessary. However, no agreed-upon method exists for this analysis. This is crucial because different metrics may interpret and rate two confusion matrices differently. We introduce the Worthiness Benchmark (γ), a new concept useful to characterize the principles by which performance metrics rank classifiers. In particular, the Worthiness Benchmark is useful to assess how a metric evaluates the superiority among two classifiers by analyzing differences in their confusion matrices. Through this new concept, we are able to deal with the main challenge of selecting the best metric to evaluate a classifier. We then perform a γ-analysis on several binary classification metrics to outline the specific benchmarks these metrics follow when comparing different classifiers.

Worthiness Benchmark: A novel concept for analyzing binary classification evaluation metrics

Shirdel M.;Di Mauro M.^{Membro del Collaboration Group};Liotta A.^{Membro del Collaboration Group}

2024

Abstract

Binary classification deals with identifying whether elements belong to one of two possible categories. Various metrics exist to evaluate the performance of such classification systems. It is important to study and contrast these metrics to find the best one for assessing a particular system. Despite extensive research in this field, a particular systematic comparison of these evaluation metrics remains an unaddressed area. The performance of a classifier is usually evaluated through the confusion matrix, a table including the count of accurate and inaccurate predictions for each category. To judge if one classifier is better than another, examining variations in the confusion matrix is necessary. However, no agreed-upon method exists for this analysis. This is crucial because different metrics may interpret and rate two confusion matrices differently. We introduce the Worthiness Benchmark (γ), a new concept useful to characterize the principles by which performance metrics rank classifiers. In particular, the Worthiness Benchmark is useful to assess how a metric evaluates the superiority among two classifiers by analyzing differences in their confusion matrices. Through this new concept, we are able to deal with the main challenge of selecting the best metric to evaluate a classifier. We then perform a γ-analysis on several binary classification metrics to outline the specific benchmarks these metrics follow when comparing different classifiers.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno di pubblicazione

2024

Appare nelle tipologie:

1.1 Articoli su Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4883151

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

0

social impact