The problem of identifying people from their voices has been the subject of increasing research activities. Interest in this problem is fostered by the important practical applications that voice authentication has. Many solutions exploit neural networks based on i-vectors and, more recently, on x-vectors, which are computed from the input audio signal. In this paper we design and implement a novel voice recognition system based on the fusion of both i-vectors and x-vectors. The recognition is text-independent, that is, the user is recognized regardless of the actual words that are pronounced. We performed preliminary experiments to assess the effectiveness of the proposed solution. Results show that the proposed method achieves performance improvement compared with approaches based on only i-vectors or only x-vectors
Text-independent voice recognition based on Siamese networks and fusion embeddings
De Prisco R.;Malandrino D.;Zaccagnino R.
2023
Abstract
The problem of identifying people from their voices has been the subject of increasing research activities. Interest in this problem is fostered by the important practical applications that voice authentication has. Many solutions exploit neural networks based on i-vectors and, more recently, on x-vectors, which are computed from the input audio signal. In this paper we design and implement a novel voice recognition system based on the fusion of both i-vectors and x-vectors. The recognition is text-independent, that is, the user is recognized regardless of the actual words that are pronounced. We performed preliminary experiments to assess the effectiveness of the proposed solution. Results show that the proposed method achieves performance improvement compared with approaches based on only i-vectors or only x-vectorsI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.