Out-of-stock (OOS) detection in retail environments is essential to ensure efficient inventory management and maintain high levels of customer satisfaction. Within this context, mobile robotic platforms, equipped with a camera and an empty shelf object detector, have emerged as a promising solution. However, detector-based approaches suffer from a fundamental trade-off between false positives and missed detections, with limited generalization capabilities due to small available training datasets, and high false positive rate in cluttered retail scenes. To overcome these challenges, we propose DRIVE (Distributed Robotic Intelligence for Vision-based Exploration), a novel distributed architecture that combines a lightweight on-board object detector with a cloud-based transformer-powered semantic validation stage. This two-tier design mitigates the precision–recall trade-off of traditional detectors, reducing false positives without sacrificing recall, while ensuring real-time feasibility on resource-constrained platforms. Furthermore, to enable robust domain adaptation under low-data regimes without catastrophic forgetting, we fine-tune the vision transformer backbone using Parameter-Efficient Fine-Tuning (PEFT) via Low-Rank Adaptation (LoRA), thus injecting less than 1% additional parameters while preserving pretrained knowledge. Extensive experiments in a real supermarket environments demonstrate that DRIVE achieves impressive robustness and accuracy compared to state-of-the-art detection-based solutions, paving the way for scalable, autonomous OOS detection in dynamic retail scenarios.

DRIVE: Distributed Robotic Intelligence for Vision-based Exploration for retail shelf monitoring

Saggese, Alessia
;
Vento, Mario
2026

Abstract

Out-of-stock (OOS) detection in retail environments is essential to ensure efficient inventory management and maintain high levels of customer satisfaction. Within this context, mobile robotic platforms, equipped with a camera and an empty shelf object detector, have emerged as a promising solution. However, detector-based approaches suffer from a fundamental trade-off between false positives and missed detections, with limited generalization capabilities due to small available training datasets, and high false positive rate in cluttered retail scenes. To overcome these challenges, we propose DRIVE (Distributed Robotic Intelligence for Vision-based Exploration), a novel distributed architecture that combines a lightweight on-board object detector with a cloud-based transformer-powered semantic validation stage. This two-tier design mitigates the precision–recall trade-off of traditional detectors, reducing false positives without sacrificing recall, while ensuring real-time feasibility on resource-constrained platforms. Furthermore, to enable robust domain adaptation under low-data regimes without catastrophic forgetting, we fine-tune the vision transformer backbone using Parameter-Efficient Fine-Tuning (PEFT) via Low-Rank Adaptation (LoRA), thus injecting less than 1% additional parameters while preserving pretrained knowledge. Extensive experiments in a real supermarket environments demonstrate that DRIVE achieves impressive robustness and accuracy compared to state-of-the-art detection-based solutions, paving the way for scalable, autonomous OOS detection in dynamic retail scenarios.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4945480
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact