Diabetic retinopathy (DR) screening requires artificial intelligence (AI) models that are not only highly accurate in grading five clinical stages, but are also capable of generating reliable explanations at the level of the lesion to earn the trust of clinicians. We propose RobustDRNet, a hybrid ensemble model that combines local convolutional features from Residual Network34 (ResNet-34) and Convolutional Neural Network Next-Tiny (ConvNeXt-Tiny) with global transformer embeddings from the Vision Transformer Base/16 (ViT-B16) via a two stage feature fusion, a disentangled multilayer perceptron (MLP), followed by a stacking logistic regression metalearner to predict aggregation. To address the severe class imbalance, our training pipeline employs stratified sampling, contrast-limited adaptive histogram equalization (CLAHE) for contrast enhancement, hard data augmentation, and classweighted focal loss. Evaluated on the Asia Pacific Tele-Ophthalmology Society (APTOS) 2019 dataset, RobustDRNet achieved 88.4% validation accuracy, 0.967 macro-averaged area under the receiver operating characteristic curve (macro-AUC), and Cohen’s Kappa of 0.823, outperforming individual backbones and simple voting ensembles. Beyond classification performance, we integrated six complementary explainable AI (XAI) techniques: Gradient-weighted Class Activation Mapping++ (Grad-CAM++), Integrated Gradients, attention rollout, SHapley Additive explanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), and Testing with Concept Activation Vectors (TCAV), each quantitatively benchmarked against expertannotated lesion maps from the Indian Diabetic Retinopathy Image Dataset (IDRiD). Saliency maps achieve mean Intersection over Union (IoU) scores of 0.06 for Grad-CAM++ and ̃0.10 for Integrated Gradients; SHapley Additive exPlanations (SHAP) perturbations show a deletion drop of 0.25 and insertion gain of 0.22; and TCAV achieves perfect concept alignment (score=1.0) with clinically coherent, grade-wise importance trajectories. By combining cutting-edge grading with multi-perspective and clinically validated interpretability, RobustDRNet delivers a deployable DR screening solution whose decisions are both highly accurate and transparently grounded in lesion-level pathology.

Robustdrnet: A Clinically-Aligned Hybrid Ensemble Model with Multi-Method Explainability for Lesion-Aware Diabetic Retinopathy Grading

Khokhar, Pir Bakhsh
Software
;
Pentangelo, Viviana
Methodology
;
Gravino, Carmine
Supervision
;
Palomba, Fabio
Supervision
2025

Abstract

Diabetic retinopathy (DR) screening requires artificial intelligence (AI) models that are not only highly accurate in grading five clinical stages, but are also capable of generating reliable explanations at the level of the lesion to earn the trust of clinicians. We propose RobustDRNet, a hybrid ensemble model that combines local convolutional features from Residual Network34 (ResNet-34) and Convolutional Neural Network Next-Tiny (ConvNeXt-Tiny) with global transformer embeddings from the Vision Transformer Base/16 (ViT-B16) via a two stage feature fusion, a disentangled multilayer perceptron (MLP), followed by a stacking logistic regression metalearner to predict aggregation. To address the severe class imbalance, our training pipeline employs stratified sampling, contrast-limited adaptive histogram equalization (CLAHE) for contrast enhancement, hard data augmentation, and classweighted focal loss. Evaluated on the Asia Pacific Tele-Ophthalmology Society (APTOS) 2019 dataset, RobustDRNet achieved 88.4% validation accuracy, 0.967 macro-averaged area under the receiver operating characteristic curve (macro-AUC), and Cohen’s Kappa of 0.823, outperforming individual backbones and simple voting ensembles. Beyond classification performance, we integrated six complementary explainable AI (XAI) techniques: Gradient-weighted Class Activation Mapping++ (Grad-CAM++), Integrated Gradients, attention rollout, SHapley Additive explanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), and Testing with Concept Activation Vectors (TCAV), each quantitatively benchmarked against expertannotated lesion maps from the Indian Diabetic Retinopathy Image Dataset (IDRiD). Saliency maps achieve mean Intersection over Union (IoU) scores of 0.06 for Grad-CAM++ and ̃0.10 for Integrated Gradients; SHapley Additive exPlanations (SHAP) perturbations show a deletion drop of 0.25 and insertion gain of 0.22; and TCAV achieves perfect concept alignment (score=1.0) with clinically coherent, grade-wise importance trajectories. By combining cutting-edge grading with multi-perspective and clinically validated interpretability, RobustDRNet delivers a deployable DR screening solution whose decisions are both highly accurate and transparently grounded in lesion-level pathology.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4919828
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact