Balancing the user-driven feature selection and their incidence in the clustering structure formation

Ferdinando Di Martino,; Senatore, Sabrina

doi:https://doi.org/10.1016/j.asoc.2020.106854

The feature selection represents a key step in mining high-dimensional data: the significance of features in maintaining the data structure while ignoring the feature redundancy is crucial to improve the final performance of classification methods. At the same time, an accurate understanding of feature domains may need human intervention to balance the importance of structure-based features with those one dictated by human expertise. To address this issue, this work introduces a human-driven feature selection method for data clustering. The algorithm, called Feature Selection EFCM (FS-EFCM in short), aims at supporting the relevance of some features from the domain of interest, but preserving their incidence in the natural clustering structure. The relevance and incidence of each feature are measure assessed during the FS-EFCM execution, in order to find a balance between the human suggestions about the feature importance and the by feature incidence in the natural cluster-based data distribution. Experimental results and comparisons highlight how the algorithm is robust in the presence of not very significant features, and the classification performance shows the effectiveness of the proposed feature selection method compared with the well-known feature selection algorithms.