In Human-Robot Interaction, speech is one of the most intuitive and effective communication channel. In Industry 4.0, speech-based communication can significantly enhance productivity and efficiency on production lines. However, deploying a Speech Command Recognition Module in real-world industrial settings poses challenges, as the system must balance two conflicting objectives: accurately recognizing commands while rejecting noise and irrelevant speech. To address this, we propose a modular framework designed to optimize recognition accuracy and rejection robustness while minimizing the need for extensive industrial dataset collection. The framework features an efficient Command Recognition module trained on laboratory-collected data augmented with synthetic samples. Advanced context-aware data augmentation techniques and dynamic noise injection further enhance the model’s robustness. To improve reliability in noisy environments, a Keyword Spotting module is introduced, activating the recognition system only when a predefined keyword is detected. The proposed system was evaluated using real-world samples collected in a noisy industrial setting. The results demonstrated a high recall rate for both command recognition and noise rejection, confirming the system’s effectiveness in meeting the demands of industrial applications.

Context-aware data augmentation for enhanced speech command recognition in industrial environments

De Simone, Giuseppe;Greco, Antonio;Rosa, Francesco;Saggese, Alessia;Vento, Mario
2025

Abstract

In Human-Robot Interaction, speech is one of the most intuitive and effective communication channel. In Industry 4.0, speech-based communication can significantly enhance productivity and efficiency on production lines. However, deploying a Speech Command Recognition Module in real-world industrial settings poses challenges, as the system must balance two conflicting objectives: accurately recognizing commands while rejecting noise and irrelevant speech. To address this, we propose a modular framework designed to optimize recognition accuracy and rejection robustness while minimizing the need for extensive industrial dataset collection. The framework features an efficient Command Recognition module trained on laboratory-collected data augmented with synthetic samples. Advanced context-aware data augmentation techniques and dynamic noise injection further enhance the model’s robustness. To improve reliability in noisy environments, a Keyword Spotting module is introduced, activating the recognition system only when a predefined keyword is detected. The proposed system was evaluated using real-world samples collected in a noisy industrial setting. The results demonstrated a high recall rate for both command recognition and noise rejection, confirming the system’s effectiveness in meeting the demands of industrial applications.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4913916
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact