The majority of the available classification systems focus on the minimization of the classification error rate. This is not always a suitable metric specially when dealing with two-class problems with skewed classes and cost distributions. In this case, an effective criterion to measure the quality of a decision rule is the area under the Receiver Operating Characteristic curve (AUC) that is also useful to measure the ranking quality of a classifier as required in many real applications. In this paper we propose a nonparametric linear classifier based on the maximization of AUC. The approach lies on the analysis of the Wilcoxon–Mann–Whitney statistic of each single feature and on an iterative pairwise coupling of the features for the optimization of the ranking of the combined feature. By the pairwise feature evaluation the proposed procedure is essentially different from other classifiers using AUC as a criterion. Experiments performed on synthetic and real data sets and comparisons with previous approaches confirm the effectiveness of the proposed method.

Maximizing the Area Under the ROC Curve by Pairwise Feature Combination

F. TORTORELLA
Methodology
2008-01-01

Abstract

The majority of the available classification systems focus on the minimization of the classification error rate. This is not always a suitable metric specially when dealing with two-class problems with skewed classes and cost distributions. In this case, an effective criterion to measure the quality of a decision rule is the area under the Receiver Operating Characteristic curve (AUC) that is also useful to measure the ranking quality of a classifier as required in many real applications. In this paper we propose a nonparametric linear classifier based on the maximization of AUC. The approach lies on the analysis of the Wilcoxon–Mann–Whitney statistic of each single feature and on an iterative pairwise coupling of the features for the optimization of the ranking of the combined feature. By the pairwise feature evaluation the proposed procedure is essentially different from other classifiers using AUC as a criterion. Experiments performed on synthetic and real data sets and comparisons with previous approaches confirm the effectiveness of the proposed method.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4721705
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 68
  • ???jsp.display-item.citation.isi??? 55
social impact