Artificial Intelligence plays a fundamental role in the speech-based interaction between humans and machines in cognitive robotic systems. This is particularly true when dealing with very crowded environments, such as museums or fairs, where cognitive systems could be profitably used. The existing datasets “in the wild” are not sufficiently representative for this purposes, thus there is a growing need to make publicly available a more complex dataset for speaker recognition in extremely noisy conditions. In this paper, we propose the Speaker Recognition dataset in the Wild (SpReW), a novel and more challenging Italian audio database for speaker recognition tasks. Moreover, we report a quantitative evaluation of a novel CNN architecture for Speaker Identification tasks called SincNet, on the proposed dataset. SincNet has been chosen as a baseline architecture since it has obtained impressive results on widely used controlled datasets. Experimental results demonstrate the difficulties when dealing with very noisy test sets and few clearly acquired samples for training.

A Challenging Voice Dataset for Robotic Applications in Noisy Environments

ROBERTO, ANTONIO;Saggese, Alessia;Vento, Mario
2019-01-01

Abstract

Artificial Intelligence plays a fundamental role in the speech-based interaction between humans and machines in cognitive robotic systems. This is particularly true when dealing with very crowded environments, such as museums or fairs, where cognitive systems could be profitably used. The existing datasets “in the wild” are not sufficiently representative for this purposes, thus there is a growing need to make publicly available a more complex dataset for speaker recognition in extremely noisy conditions. In this paper, we propose the Speaker Recognition dataset in the Wild (SpReW), a novel and more challenging Italian audio database for speaker recognition tasks. Moreover, we report a quantitative evaluation of a novel CNN architecture for Speaker Identification tasks called SincNet, on the proposed dataset. SincNet has been chosen as a baseline architecture since it has obtained impressive results on widely used controlled datasets. Experimental results demonstrate the difficulties when dealing with very noisy test sets and few clearly acquired samples for training.
978-3-030-29890-6
978-3-030-29891-3
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4728133
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
social impact