Highly crowd counting is a rapidly growing field, driven by the increasing demand for accurate and real-time crowd monitoring. Within this context, in this paper we formulate the problem in terms of point detection and we propose a novel training strategy, especially devised for point detection networks. The baseline architecture we use is Point to Point Network (P2PNet), that have shown impressing accuracy results in both localization and crowd counting task. In order to be able to deal with both sparse and very dense scenarios, and to well generalize both indoor and outdoor, we propose a brain-inspired training strategy based on curriculum learning, combined with a customized data augmentation technique. The main idea is that the neural network has to mimic human learning by initially taking into account the easy samples (sparse scenes) and then moving on to the more challenging ones (the ones with thousands of persons). The experimentation has shown impressive results. Indeed, with respect to the baseline solution, we obtain an improvement of 59%, 62% and 48% over the three indices we have considered, respectively MAE, MSE and nAP. An example of the proposed system in action is shown at the following link: https:// youtu.be/yAHe7CI60hE.
Highly Crowd Detection and Counting Based on Curriculum Learning
Fotia, Lidia;Percannella, Gennaro;Saggese, Alessia
;Vento, Mario
2023
Abstract
Highly crowd counting is a rapidly growing field, driven by the increasing demand for accurate and real-time crowd monitoring. Within this context, in this paper we formulate the problem in terms of point detection and we propose a novel training strategy, especially devised for point detection networks. The baseline architecture we use is Point to Point Network (P2PNet), that have shown impressing accuracy results in both localization and crowd counting task. In order to be able to deal with both sparse and very dense scenarios, and to well generalize both indoor and outdoor, we propose a brain-inspired training strategy based on curriculum learning, combined with a customized data augmentation technique. The main idea is that the neural network has to mimic human learning by initially taking into account the easy samples (sparse scenes) and then moving on to the more challenging ones (the ones with thousands of persons). The experimentation has shown impressive results. Indeed, with respect to the baseline solution, we obtain an improvement of 59%, 62% and 48% over the three indices we have considered, respectively MAE, MSE and nAP. An example of the proposed system in action is shown at the following link: https:// youtu.be/yAHe7CI60hE.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.