This paper proposes a new method to select the most relevant covariates for predicting bank defaults. In particular, as bank failure is a rare event, we estimate the probability of default of financial institutions using Generalized Extreme Value regression and implement a variable selection procedure, suitable when the binary dependent variable has a smaller number of ones than zeros. The proposed procedure has some advantages. First, it does not use any penalized function, and consequently, the estimation of regularization parameters is not required. Second, it is very easy to implement, and thus, it is efficient from a computational perspective. Third, it deals with the dependence structure and works well in the presence of strong correlation in the data. We validate the variable selection procedure by a simulation study. Moreover, we apply the procedure to a dataset of Italian banks and evaluate its performance to identify the most relevant covariates that influence the failure probability. The results of both the simulation study and empirical analysis show that our proposal outperforms other variable selection approaches, such as the forward stepwise method.
A new procedure for variable selection in presence of rare events
Francesco Giordano;Marcella Niglio
;Marialuisa Restaino
2020-01-01
Abstract
This paper proposes a new method to select the most relevant covariates for predicting bank defaults. In particular, as bank failure is a rare event, we estimate the probability of default of financial institutions using Generalized Extreme Value regression and implement a variable selection procedure, suitable when the binary dependent variable has a smaller number of ones than zeros. The proposed procedure has some advantages. First, it does not use any penalized function, and consequently, the estimation of regularization parameters is not required. Second, it is very easy to implement, and thus, it is efficient from a computational perspective. Third, it deals with the dependence structure and works well in the presence of strong correlation in the data. We validate the variable selection procedure by a simulation study. Moreover, we apply the procedure to a dataset of Italian banks and evaluate its performance to identify the most relevant covariates that influence the failure probability. The results of both the simulation study and empirical analysis show that our proposal outperforms other variable selection approaches, such as the forward stepwise method.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.