Sound noise would interfere with speech signals in natural environments, causing speech quality deterioration. Speech denoising aims to denoise effectively with the preservation of speech components. Noise estimation is critical for speech denoising. Speech components distort when overestimating the noise spectral level. On the contrary, underestimating the noise's spectral level cannot remove noise effectively. Much residual noise exists in the denoised speech, resulting in low speech quality. This article presents a multi-model deep-learning neural network (MDNN) for speech enhancement. Firstly, a harmonic-convolutional neural network (harmonic-CNN) is utilized to classify speech and noise segments by spectrograms. The target is manually labeled according to harmonic properties. A speech-deep-learning neural network (speech-DNN) improves the harmonic-CNN's recognition accuracy. Some robust speech features, including energy variation and zero-crossing rate, are also applied to classify speech and noise segments by a speech-DNN. The noise level is overestimated in speech-pause parts to suppress noise spectra effectively in the enhanced speech. Conversely, the noise level is underestimated in speech-presence frames to reduce speech distortion. The experiment results reveal that the presented MDNN accurately classifies speech and noise segments, effectively reducing interference noise.
A speech denoising demonstration system using multi-model deep-learning neural networks
Castiglione, Aniello;
In corso di stampa
Abstract
Sound noise would interfere with speech signals in natural environments, causing speech quality deterioration. Speech denoising aims to denoise effectively with the preservation of speech components. Noise estimation is critical for speech denoising. Speech components distort when overestimating the noise spectral level. On the contrary, underestimating the noise's spectral level cannot remove noise effectively. Much residual noise exists in the denoised speech, resulting in low speech quality. This article presents a multi-model deep-learning neural network (MDNN) for speech enhancement. Firstly, a harmonic-convolutional neural network (harmonic-CNN) is utilized to classify speech and noise segments by spectrograms. The target is manually labeled according to harmonic properties. A speech-deep-learning neural network (speech-DNN) improves the harmonic-CNN's recognition accuracy. Some robust speech features, including energy variation and zero-crossing rate, are also applied to classify speech and noise segments by a speech-DNN. The noise level is overestimated in speech-pause parts to suppress noise spectra effectively in the enhanced speech. Conversely, the noise level is underestimated in speech-presence frames to reduce speech distortion. The experiment results reveal that the presented MDNN accurately classifies speech and noise segments, effectively reducing interference noise.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.