The gesture is one of the most used forms of communication between humans; in recent years, given the new trend of factories to be adapted to Industry 4.0 paradigm, the scientific community has shown a growing interest towards the design of Gesture Recognition (GR) algorithms for Human-Robot Interaction (HRI) applications. Within this context, the GR algorithm needs to work in real time and over embedded platforms, with limited resources. Anyway, when looking at the available scientific literature, the aim of the different proposed neural networks (i.e. 2D and 3D) and of the different modalities used for feeding the network (i.e. RGB, RGB-D, optical flow) is typically the optimization of the accuracy, without strongly paying attention to the feasibility over low power hardware devices. Anyway, the analysis related to the trade-off between accuracy and computational burden (for both networks and modalities) becomes important so as to allow GR algorithms to work in industrial robotics applications. In this paper, we perform a wide benchmarking focusing not only on the accuracy but also on the computational burden, involving two different architectures (2D and 3D), with two different backbones (MobileNet, ResNeXt) and four types of input modalities (RGB, Depth, Optical Flow, Motion History Image) and their combinations.

Benchmarking deep neural networks for gesture recognition on embedded devices

Bini S.;Greco A.;Saggese A.;Vento M.
2022

Abstract

The gesture is one of the most used forms of communication between humans; in recent years, given the new trend of factories to be adapted to Industry 4.0 paradigm, the scientific community has shown a growing interest towards the design of Gesture Recognition (GR) algorithms for Human-Robot Interaction (HRI) applications. Within this context, the GR algorithm needs to work in real time and over embedded platforms, with limited resources. Anyway, when looking at the available scientific literature, the aim of the different proposed neural networks (i.e. 2D and 3D) and of the different modalities used for feeding the network (i.e. RGB, RGB-D, optical flow) is typically the optimization of the accuracy, without strongly paying attention to the feasibility over low power hardware devices. Anyway, the analysis related to the trade-off between accuracy and computational burden (for both networks and modalities) becomes important so as to allow GR algorithms to work in industrial robotics applications. In this paper, we perform a wide benchmarking focusing not only on the accuracy but also on the computational burden, involving two different architectures (2D and 3D), with two different backbones (MobileNet, ResNeXt) and four types of input modalities (RGB, Depth, Optical Flow, Motion History Image) and their combinations.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4807811
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact