UniSa - IRIS Institutional Research Information System

In recent years, deep learning (DL) has obtained numerous successes in analyzing complex data, such as images or audio. A particularly recent area of application is the analysis of videos. This thesis focuses on the application of deep learning algorithm to two video analysis tasks: Multiple Object Tracking (MOT) and Face-based Video Retrieval (FBVR). The first main part of the thesis presents an in-depth survey of the state of the art of DL-based MOT algorithms. This is the first comprehensive survey specifically on the use of DL for MOT, focusing on 2D frames extracted from single-camera videos. I identify the four main steps of a MOT algorithm and describe the various DL techniques used in the literature in each of those four steps. I also collect and compare results obtained by existing algorithms on the most common MOT datasets and I analyze the most successful techniques employed. Finally, I present a discussion about the open issues of current MOT algorithms and the possible solutions and future directions of research. The second part of the thesis focuses instead on the task of FBVR. I present a novel pipeline for the retrieval of unconstrained multi-shot videos using faces, specifically in the context of television-like videos. Since no existing dataset in the literature is appropriate for an end-to-end evaluation of the proposed pipeline, I build a large-scale video dataset by adapting the VoxCeleb2 dataset to the task of FBVR. I compare and evaluate numerous DL-based approaches for the various steps in pipeline, such as shot detection, face detection and face recognition, and I describe the advantages and disadvantages of each employed technique. The best-performing configuration of the pipeline obtains 97.25% Mean Average Precision on the independent test set, while performing each query on thousands of videos in less than 0.5 seconds. Finally, I describe the integration of the presented pipeline into the commercial software TVBridge, developed by CEDEO. [edited by Author]

Multiple Object Tracking and Face-based Video Retrieval: Applications of Deep Learning to Video Analysis / Gioele Ciaparrone , 2021 Oct 11., Anno Accademico 2019 - 2020. [10.14273/unisa-4522].

Multiple Object Tracking and Face-based Video Retrieval: Applications of Deep Learning to Video Analysis

Ciaparrone, Gioele

2021

Abstract

In recent years, deep learning (DL) has obtained numerous successes in analyzing complex data, such as images or audio. A particularly recent area of application is the analysis of videos. This thesis focuses on the application of deep learning algorithm to two video analysis tasks: Multiple Object Tracking (MOT) and Face-based Video Retrieval (FBVR). The first main part of the thesis presents an in-depth survey of the state of the art of DL-based MOT algorithms. This is the first comprehensive survey specifically on the use of DL for MOT, focusing on 2D frames extracted from single-camera videos. I identify the four main steps of a MOT algorithm and describe the various DL techniques used in the literature in each of those four steps. I also collect and compare results obtained by existing algorithms on the most common MOT datasets and I analyze the most successful techniques employed. Finally, I present a discussion about the open issues of current MOT algorithms and the possible solutions and future directions of research. The second part of the thesis focuses instead on the task of FBVR. I present a novel pipeline for the retrieval of unconstrained multi-shot videos using faces, specifically in the context of television-like videos. Since no existing dataset in the literature is appropriate for an end-to-end evaluation of the proposed pipeline, I build a large-scale video dataset by adapting the VoxCeleb2 dataset to the task of FBVR. I compare and evaluate numerous DL-based approaches for the various steps in pipeline, such as shot detection, face detection and face recognition, and I describe the advantages and disadvantages of each employed technique. The best-performing configuration of the pipeline obtains 97.25% Mean Average Precision on the independent test set, while performing each query on thousands of videos in less than 0.5 seconds. Finally, I describe the integration of the presented pipeline into the commercial software TVBridge, developed by CEDEO. [edited by Author]

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
				11-ott-2021
			
	Corso di dottorato
	
				Big Data Management
			
	Parole chiave
	
				Deep learning
Multiple object tracking
Video retrieval
			
	Tutor interno
	
				Tagliaferri, Roberto
Antonelli, Valerio
			
	Appare nelle tipologie:
	
				8.1 Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
102199509685187639768888735660324991756.pdf accesso aperto Tipologia: Altro materiale allegato Dimensione 4.05 MB Formato Adobe PDF Visualizza/Apri	4.05 MB	Adobe PDF	Visualizza/Apri
105811920421060580575191264304691544458.pdf accesso aperto Tipologia: Altro materiale allegato Dimensione 248.7 kB Formato Adobe PDF Visualizza/Apri	248.7 kB	Adobe PDF	Visualizza/Apri
76866079003932674515874063023522265012.pdf accesso aperto Tipologia: Altro materiale allegato Dimensione 238.12 kB Formato Adobe PDF Visualizza/Apri	238.12 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4923504

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact