Surveillance of individuals using visual data requires human-level capabilities for understanding the characteristics that differentiate one person from another. However, because the influx of both video and imagery is increasing at a greater rate than humans can cope with, biometric-based surveillance systems are required to assist with the triage of information based on human-generated queries. Unfortunately, current systems are not robust enough to tackle new tasks, as they involve specialized models that do not leverage existing, pre-trained components. To mitigate these issues, we propose a novel system for biometric-based surveillance that utilizes models that are relevance-aware to triage images and videos based on interaction with single or multiple users. As the system is initially focused on detection of people via their appearance and clothing, we have named the system Context and Collaborative (C2) Visual Question Answering (VQA) for Biometric Object-Attribute Relevance and Surveillance (C2VQA-BOARS). To validate the usefulness of C2VQA-BOARS in real-world scenarios, we provide an implementation of two novel components (Relevance and Triage) and apply them in tasks against two datasets created for biometric surveillance. Our results outperform baseline approaches, proving that a system with a minimal amount of fine-tuned components can robustly handle new datasets and problems as needed.
|Titolo:||Biometric surveillance using visual question answering|
|Data di pubblicazione:||2018|
|Appare nelle tipologie:||1.1.1 Articolo su rivista con DOI|