Binary classification deals with identifying whether elements belong to one of two possible categories. Various metrics exist to evaluate the performance of such classification systems. It is important to study and contrast these metrics to find the best one for assessing a particular system. Despite extensive research in this field, a particular systematic comparison of these evaluation metrics remains an unaddressed area. The performance of a classifier is usually evaluated through the confusion matrix, a table including the count of accurate and inaccurate predictions for each category. To judge if one classifier is better than another, examining variations in the confusion matrix is necessary. However, no agreed-upon method exists for this analysis. This is crucial because different metrics may interpret and rate two confusion matrices differently. We introduce the Worthiness Benchmark (γ), a new concept useful to characterize the principles by which performance metrics rank classifiers. In particular, the Worthiness Benchmark is useful to assess how a metric evaluates the superiority among two classifiers by analyzing differences in their confusion matrices. Through this new concept, we are able to deal with the main challenge of selecting the best metric to evaluate a classifier. We then perform a γ-analysis on several binary classification metrics to outline the specific benchmarks these metrics follow when comparing different classifiers.
Worthiness Benchmark: A novel concept for analyzing binary classification evaluation metrics
Di Mauro M.
Membro del Collaboration Group
;
2024
Abstract
Binary classification deals with identifying whether elements belong to one of two possible categories. Various metrics exist to evaluate the performance of such classification systems. It is important to study and contrast these metrics to find the best one for assessing a particular system. Despite extensive research in this field, a particular systematic comparison of these evaluation metrics remains an unaddressed area. The performance of a classifier is usually evaluated through the confusion matrix, a table including the count of accurate and inaccurate predictions for each category. To judge if one classifier is better than another, examining variations in the confusion matrix is necessary. However, no agreed-upon method exists for this analysis. This is crucial because different metrics may interpret and rate two confusion matrices differently. We introduce the Worthiness Benchmark (γ), a new concept useful to characterize the principles by which performance metrics rank classifiers. In particular, the Worthiness Benchmark is useful to assess how a metric evaluates the superiority among two classifiers by analyzing differences in their confusion matrices. Through this new concept, we are able to deal with the main challenge of selecting the best metric to evaluate a classifier. We then perform a γ-analysis on several binary classification metrics to outline the specific benchmarks these metrics follow when comparing different classifiers.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.