Diabetic retinopathy (DR) screening requires artificial intelligence (AI) models that are not only highly accurate in grading five clinical stages, but are also capable of generating reliable explanations at the level of the lesion to earn the trust of clinicians. We propose RobustDRNet, a hybrid ensemble model that combines local convolutional features from Residual Network34 (ResNet-34) and Convolutional Neural Network Next-Tiny (ConvNeXt-Tiny) with global transformer embeddings from the Vision Transformer Base/16 (ViT-B16) via a two stage feature fusion, a disentangled multilayer perceptron (MLP), followed by a stacking logistic regression metalearner to predict aggregation. To address the severe class imbalance, our training pipeline employs stratified sampling, contrast-limited adaptive histogram equalization (CLAHE) for contrast enhancement, hard data augmentation, and classweighted focal loss. Evaluated on the Asia Pacific Tele-Ophthalmology Society (APTOS) 2019 dataset, RobustDRNet achieved 88.4% validation accuracy, 0.967 macro-averaged area under the receiver operating characteristic curve (macro-AUC), and Cohen’s Kappa of 0.823, outperforming individual backbones and simple voting ensembles. Beyond classification performance, we integrated six complementary explainable AI (XAI) techniques: Gradient-weighted Class Activation Mapping++ (Grad-CAM++), Integrated Gradients, attention rollout, SHapley Additive explanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), and Testing with Concept Activation Vectors (TCAV), each quantitatively benchmarked against expertannotated lesion maps from the Indian Diabetic Retinopathy Image Dataset (IDRiD). Saliency maps achieve mean Intersection over Union (IoU) scores of 0.06 for Grad-CAM++ and ̃0.10 for Integrated Gradients; SHapley Additive exPlanations (SHAP) perturbations show a deletion drop of 0.25 and insertion gain of 0.22; and TCAV achieves perfect concept alignment (score=1.0) with clinically coherent, grade-wise importance trajectories. By combining cutting-edge grading with multi-perspective and clinically validated interpretability, RobustDRNet delivers a deployable DR screening solution whose decisions are both highly accurate and transparently grounded in lesion-level pathology.
Robustdrnet: A Clinically-Aligned Hybrid Ensemble Model with Multi-Method Explainability for Lesion-Aware Diabetic Retinopathy Grading
Khokhar, Pir Bakhsh
Software
;Pentangelo, VivianaMethodology
;Gravino, CarmineSupervision
;Palomba, FabioSupervision
2025
Abstract
Diabetic retinopathy (DR) screening requires artificial intelligence (AI) models that are not only highly accurate in grading five clinical stages, but are also capable of generating reliable explanations at the level of the lesion to earn the trust of clinicians. We propose RobustDRNet, a hybrid ensemble model that combines local convolutional features from Residual Network34 (ResNet-34) and Convolutional Neural Network Next-Tiny (ConvNeXt-Tiny) with global transformer embeddings from the Vision Transformer Base/16 (ViT-B16) via a two stage feature fusion, a disentangled multilayer perceptron (MLP), followed by a stacking logistic regression metalearner to predict aggregation. To address the severe class imbalance, our training pipeline employs stratified sampling, contrast-limited adaptive histogram equalization (CLAHE) for contrast enhancement, hard data augmentation, and classweighted focal loss. Evaluated on the Asia Pacific Tele-Ophthalmology Society (APTOS) 2019 dataset, RobustDRNet achieved 88.4% validation accuracy, 0.967 macro-averaged area under the receiver operating characteristic curve (macro-AUC), and Cohen’s Kappa of 0.823, outperforming individual backbones and simple voting ensembles. Beyond classification performance, we integrated six complementary explainable AI (XAI) techniques: Gradient-weighted Class Activation Mapping++ (Grad-CAM++), Integrated Gradients, attention rollout, SHapley Additive explanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), and Testing with Concept Activation Vectors (TCAV), each quantitatively benchmarked against expertannotated lesion maps from the Indian Diabetic Retinopathy Image Dataset (IDRiD). Saliency maps achieve mean Intersection over Union (IoU) scores of 0.06 for Grad-CAM++ and ̃0.10 for Integrated Gradients; SHapley Additive exPlanations (SHAP) perturbations show a deletion drop of 0.25 and insertion gain of 0.22; and TCAV achieves perfect concept alignment (score=1.0) with clinically coherent, grade-wise importance trajectories. By combining cutting-edge grading with multi-perspective and clinically validated interpretability, RobustDRNet delivers a deployable DR screening solution whose decisions are both highly accurate and transparently grounded in lesion-level pathology.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


