This paper presents the experimental evaluation and analyzes the results of the first edition of the pedestrian attribute recognition (PAR) contest, the international competition which focused on smart visual sensors based on multi-task computer vision methods for the recognition of binary and multi-class pedestrian attributes from images. The participant teams designed intelligent sensors based on vision-language models, transformers and convolutional neural networks that address the multi-label recognition problem leveraging task interdependencies to enhance model efficiency and effectiveness. Participants were provided with the MIVIA PAR Dataset, containing 105,244 annotated pedestrian images for training and validation, and their methods were evaluated on a private test set of over 20,000 images. In the paper, we analyze the smart visual sensors proposed by the participating teams, examining the results in terms of accuracy, standard deviation and confusion matrices and highlighting the correlations between design choices and performance. The results of this experimental evaluation, conducted in a challenging and realistic framework, suggest possible directions for future improvements in these smart sensors that are thoroughly discussed in the paper.

An Experimental Evaluation of Smart Sensors for Pedestrian Attribute Recognition Using Multi-Task Learning and Vision Language Models

Greco A.;Saggese A.;
2025

Abstract

This paper presents the experimental evaluation and analyzes the results of the first edition of the pedestrian attribute recognition (PAR) contest, the international competition which focused on smart visual sensors based on multi-task computer vision methods for the recognition of binary and multi-class pedestrian attributes from images. The participant teams designed intelligent sensors based on vision-language models, transformers and convolutional neural networks that address the multi-label recognition problem leveraging task interdependencies to enhance model efficiency and effectiveness. Participants were provided with the MIVIA PAR Dataset, containing 105,244 annotated pedestrian images for training and validation, and their methods were evaluated on a private test set of over 20,000 images. In the paper, we analyze the smart visual sensors proposed by the participating teams, examining the results in terms of accuracy, standard deviation and confusion matrices and highlighting the correlations between design choices and performance. The results of this experimental evaluation, conducted in a challenging and realistic framework, suggest possible directions for future improvements in these smart sensors that are thoroughly discussed in the paper.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4906676
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact