This paper presents the experimental evaluation and analyzes the results of the first edition of the pedestrian attribute recognition (PAR) contest, the international competition which focused on smart visual sensors based on multi-task computer vision methods for the recognition of binary and multi-class pedestrian attributes from images. The participant teams designed intelligent sensors based on vision-language models, transformers and convolutional neural networks that address the multi-label recognition problem leveraging task interdependencies to enhance model efficiency and effectiveness. Participants were provided with the MIVIA PAR Dataset, containing 105,244 annotated pedestrian images for training and validation, and their methods were evaluated on a private test set of over 20,000 images. In the paper, we analyze the smart visual sensors proposed by the participating teams, examining the results in terms of accuracy, standard deviation and confusion matrices and highlighting the correlations between design choices and performance. The results of this experimental evaluation, conducted in a challenging and realistic framework, suggest possible directions for future improvements in these smart sensors that are thoroughly discussed in the paper.
An Experimental Evaluation of Smart Sensors for Pedestrian Attribute Recognition Using Multi-Task Learning and Vision Language Models
Greco A.;Saggese A.;
2025
Abstract
This paper presents the experimental evaluation and analyzes the results of the first edition of the pedestrian attribute recognition (PAR) contest, the international competition which focused on smart visual sensors based on multi-task computer vision methods for the recognition of binary and multi-class pedestrian attributes from images. The participant teams designed intelligent sensors based on vision-language models, transformers and convolutional neural networks that address the multi-label recognition problem leveraging task interdependencies to enhance model efficiency and effectiveness. Participants were provided with the MIVIA PAR Dataset, containing 105,244 annotated pedestrian images for training and validation, and their methods were evaluated on a private test set of over 20,000 images. In the paper, we analyze the smart visual sensors proposed by the participating teams, examining the results in terms of accuracy, standard deviation and confusion matrices and highlighting the correlations between design choices and performance. The results of this experimental evaluation, conducted in a challenging and realistic framework, suggest possible directions for future improvements in these smart sensors that are thoroughly discussed in the paper.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.