This work presents an extensive design space exploration toward the optimal design of a hardware (HW) accelerator for multiclass object classification, implementing a configurable convolutional neural network (CNN) to be closely coupled with ultralow spatial resolution (ULR) time-of-flight (ToF) sensors in an in-sensors computing approach. The study leverages ULR ToF as the sole sensing element to perform classification, exploiting only the low-resolution depth map of the scene. The investigation, based on the STMicroelectronics VL53L8CX 8 x 8 pixel ToF sensor, led to very high accuracy, together with unprecedented low power consumption, compactness, and real-time operation. Indeed, the CNN classifies four objects even in the presence of partial occlusion and overlap, achieving an accuracy higher than 92% when an 8-bit posttraining quantization is used. The derived architecture, implemented in skywater CMOS 130-nm technology, occupies less than 1 mm(2) of area, with a power consumption of about 11% of the overall power consumption of the sensor in ranging mode and consumes less than 80 nJ per inference with an inference time of 3.3 mu s.
Design Space Exploration of an In-Sensor Processor for Object Classification in Ultra-compact Time-of-Flight Sensors
Fasolino A.;Liguori R.;Di Benedetto L.;Rubino A.;Licciardo G. D.
2025
Abstract
This work presents an extensive design space exploration toward the optimal design of a hardware (HW) accelerator for multiclass object classification, implementing a configurable convolutional neural network (CNN) to be closely coupled with ultralow spatial resolution (ULR) time-of-flight (ToF) sensors in an in-sensors computing approach. The study leverages ULR ToF as the sole sensing element to perform classification, exploiting only the low-resolution depth map of the scene. The investigation, based on the STMicroelectronics VL53L8CX 8 x 8 pixel ToF sensor, led to very high accuracy, together with unprecedented low power consumption, compactness, and real-time operation. Indeed, the CNN classifies four objects even in the presence of partial occlusion and overlap, achieving an accuracy higher than 92% when an 8-bit posttraining quantization is used. The derived architecture, implemented in skywater CMOS 130-nm technology, occupies less than 1 mm(2) of area, with a power consumption of about 11% of the overall power consumption of the sensor in ranging mode and consumes less than 80 nJ per inference with an inference time of 3.3 mu s.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.