Surveillance of individuals using visual data requires human-level capabilities for understanding the characteristics that differentiate one person from another. However, because the influx of both video and imagery is increasing at a greater rate than humans can cope with, biometric-based surveillance systems are required to assist with the triage of information based on human-generated queries. Unfortunately, current systems are not robust enough to tackle new tasks, as they involve specialized models that do not leverage existing, pre-trained components. To mitigate these issues, we propose a novel system for biometric-based surveillance that utilizes models that are relevance-aware to triage images and videos based on interaction with single or multiple users. As the system is initially focused on detection of people via their appearance and clothing, we have named the system Context and Collaborative (C2) Visual Question Answering (VQA) for Biometric Object-Attribute Relevance and Surveillance (C2VQA-BOARS). To validate the usefulness of C2VQA-BOARS in real-world scenarios, we provide an implementation of two novel components (Relevance and Triage) and apply them in tasks against two datasets created for biometric surveillance. Our results outperform baseline approaches, proving that a system with a minimal amount of fine-tuned components can robustly handle new datasets and problems as needed.

Biometric surveillance using visual question answering

Nappi, Michele
2019-01-01

Abstract

Surveillance of individuals using visual data requires human-level capabilities for understanding the characteristics that differentiate one person from another. However, because the influx of both video and imagery is increasing at a greater rate than humans can cope with, biometric-based surveillance systems are required to assist with the triage of information based on human-generated queries. Unfortunately, current systems are not robust enough to tackle new tasks, as they involve specialized models that do not leverage existing, pre-trained components. To mitigate these issues, we propose a novel system for biometric-based surveillance that utilizes models that are relevance-aware to triage images and videos based on interaction with single or multiple users. As the system is initially focused on detection of people via their appearance and clothing, we have named the system Context and Collaborative (C2) Visual Question Answering (VQA) for Biometric Object-Attribute Relevance and Surveillance (C2VQA-BOARS). To validate the usefulness of C2VQA-BOARS in real-world scenarios, we provide an implementation of two novel components (Relevance and Triage) and apply them in tasks against two datasets created for biometric surveillance. Our results outperform baseline approaches, proving that a system with a minimal amount of fine-tuned components can robustly handle new datasets and problems as needed.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4708049
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 14
social impact