The paper describes a new Text Preprocessing Pipeline based on a Hybrid approach which involve rule-based and stochastic approaches. The presented pipeline is part of a larger project titled Big Data for Multi-Agent Specialized System developed by Network Contacts in collaboration with University of Salerno and other institutional partners. The aim of the project is to build an Hybrid Question Answering System composed by sets of Dialog Bots able to process great volumes of data. Due to the importance of unstructured textual data, a particular focus of the project is on automatic processing of Text. The paper will describe the three main modules of the preprocessing pipeline, which involve a Style Correction Module, a Clitic Decomposition Module and a POS Tagging and Lemmatization Module.
Automatic Text Preprocessing for Intelligent Dialog Agents
Maisto, Alessandro
Methodology
;Pelosi, Serena
Investigation
;POLITO, Massimiliano
Software
;Stingo, Michele
Validation
2019
Abstract
The paper describes a new Text Preprocessing Pipeline based on a Hybrid approach which involve rule-based and stochastic approaches. The presented pipeline is part of a larger project titled Big Data for Multi-Agent Specialized System developed by Network Contacts in collaboration with University of Salerno and other institutional partners. The aim of the project is to build an Hybrid Question Answering System composed by sets of Dialog Bots able to process great volumes of data. Due to the importance of unstructured textual data, a particular focus of the project is on automatic processing of Text. The paper will describe the three main modules of the preprocessing pipeline, which involve a Style Correction Module, a Clitic Decomposition Module and a POS Tagging and Lemmatization Module.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.