More and more real-world applications, especially in cognitive robotics, require the running of different computer vision algorithms in parallel on board of embedded devices with limited GPU and memory resources. Multi-task learning, namely the usage of the same model to perform multiple classification and/or regression tasks by learning a shared low level representation, revealed to be a valid solution to reduce computation and required memory space while preserving the accuracy. In this paper, we propose a solution for real-time user profiling based on a multi-task convolutional neural network (CNN) for gender, age, ethnicity and emotion recognition from face images. To find the best trade-off between accuracy and processing time, we evaluate three different architectures, specifically designed for the purpose, and backbones, based on MobileNet, ResNet and SENet, which include convolutional layers, residual blocks and attention modules that already demonstrated great potential in face analysis. We trained the multi-task neural network with a custom learning procedure, which solves the problems of missing labels, dataset imbalance and loss function imbalance through label masking, batch balancing and a custom weighted loss function; there are no other multi-task neural networks for face analysis that address all these challenges simultaneously. The proposed solution demonstrated its effectiveness in the comparison with the corresponding single-task CNNs in terms of accuracy, processing time and memory space; in fact, the multi-task CNNs achieved a processing speed-up between 2.5 and 4 times and a reduction of the memory space between 2 and 4 times, while preserving the accuracy. Moreover, the useful insights that arise from the experiments allow to choose a solution for face analysis easily integrable into real applications on smart cameras and embedded systems and most suited for the specific application constraints in terms of computational resources.

Multi-task learning on the edge for effective gender, age, ethnicity and emotion recognition

Foggia P.;Greco A.;Saggese A.;Vento M.
2023-01-01

Abstract

More and more real-world applications, especially in cognitive robotics, require the running of different computer vision algorithms in parallel on board of embedded devices with limited GPU and memory resources. Multi-task learning, namely the usage of the same model to perform multiple classification and/or regression tasks by learning a shared low level representation, revealed to be a valid solution to reduce computation and required memory space while preserving the accuracy. In this paper, we propose a solution for real-time user profiling based on a multi-task convolutional neural network (CNN) for gender, age, ethnicity and emotion recognition from face images. To find the best trade-off between accuracy and processing time, we evaluate three different architectures, specifically designed for the purpose, and backbones, based on MobileNet, ResNet and SENet, which include convolutional layers, residual blocks and attention modules that already demonstrated great potential in face analysis. We trained the multi-task neural network with a custom learning procedure, which solves the problems of missing labels, dataset imbalance and loss function imbalance through label masking, batch balancing and a custom weighted loss function; there are no other multi-task neural networks for face analysis that address all these challenges simultaneously. The proposed solution demonstrated its effectiveness in the comparison with the corresponding single-task CNNs in terms of accuracy, processing time and memory space; in fact, the multi-task CNNs achieved a processing speed-up between 2.5 and 4 times and a reduction of the memory space between 2 and 4 times, while preserving the accuracy. Moreover, the useful insights that arise from the experiments allow to choose a solution for face analysis easily integrable into real applications on smart cameras and embedded systems and most suited for the specific application constraints in terms of computational resources.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4817633
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 11
social impact