The paper describes a new Text Preprocessing Pipeline based on a Hybrid approach which involve rule-based and stochastic approaches. The presented pipeline is part of a larger project titled Big Data for Multi-Agent Specialized System developed by Network Contacts in collaboration with University of Salerno and other institutional partners. The aim of the project is to build an Hybrid Question Answering System composed by sets of Dialog Bots able to process great volumes of data. Due to the importance of unstructured textual data, a particular focus of the project is on automatic processing of Text. The paper will describe the three main modules of the preprocessing pipeline, which involve a Style Correction Module, a Clitic Decomposition Module and a POS Tagging and Lemmatization Module.

Automatic Text Preprocessing for Intelligent Dialog Agents

Maisto, Alessandro
Methodology
;
Pelosi, Serena
Investigation
;
POLITO, Massimiliano
Software
;
Stingo, Michele
Validation
2019

Abstract

The paper describes a new Text Preprocessing Pipeline based on a Hybrid approach which involve rule-based and stochastic approaches. The presented pipeline is part of a larger project titled Big Data for Multi-Agent Specialized System developed by Network Contacts in collaboration with University of Salerno and other institutional partners. The aim of the project is to build an Hybrid Question Answering System composed by sets of Dialog Bots able to process great volumes of data. Due to the importance of unstructured textual data, a particular focus of the project is on automatic processing of Text. The paper will describe the three main modules of the preprocessing pipeline, which involve a Style Correction Module, a Clitic Decomposition Module and a POS Tagging and Lemmatization Module.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11386/4724164
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact