Automatic fire detection in video surveillance is a critical capability for safeguarding industrial sites, public infrastructures, and natural environments. Although recent advances in deep learning have substantially improved early fire detection, the performance of existing methods remains highly dependent on scene conditions such as viewing range, background activity, and the presence of fire-like distractors. This paper presents a comprehensive evaluation of real-time fire detection techniques on the ONFIRE and FIRE-TASTIC benchmarks, currently the most extensive datasets available for video-based fire monitoring. Through a systematic comparison of methods that differ in detector quality, temporal modeling, and semantic reasoning capabilities, we highlight the strengths and limitations of current design paradigms and analyze their behaviour across scenarios that differ in terms of range and background activity. Our results reveal clear performance trends, emphasizing the crucial role of careful detector training, scenario-aware adaptation, temporal analysis, and lightweight vision–language integration. Building on these insights, we outline promising research directions, including the development of more representative datasets, improved semantic confirmation modules, and robust continual learning strategies capable of maintaining reliability in dynamic real-world deployments.
Real-Time Fire Detection From Video Surveillance Cameras: Where Are We Now?
Carletti, Vincenzo;Greco, Antonio;Saggese, Alessia;
2026
Abstract
Automatic fire detection in video surveillance is a critical capability for safeguarding industrial sites, public infrastructures, and natural environments. Although recent advances in deep learning have substantially improved early fire detection, the performance of existing methods remains highly dependent on scene conditions such as viewing range, background activity, and the presence of fire-like distractors. This paper presents a comprehensive evaluation of real-time fire detection techniques on the ONFIRE and FIRE-TASTIC benchmarks, currently the most extensive datasets available for video-based fire monitoring. Through a systematic comparison of methods that differ in detector quality, temporal modeling, and semantic reasoning capabilities, we highlight the strengths and limitations of current design paradigms and analyze their behaviour across scenarios that differ in terms of range and background activity. Our results reveal clear performance trends, emphasizing the crucial role of careful detector training, scenario-aware adaptation, temporal analysis, and lightweight vision–language integration. Building on these insights, we outline promising research directions, including the development of more representative datasets, improved semantic confirmation modules, and robust continual learning strategies capable of maintaining reliability in dynamic real-world deployments.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


