Abstract In the era of Internet of “everything”, the natural language text is still the undiscussed medium of representing information, as evidenced by the pervasiveness of tweets, instant messages, posts, and documents. There is an increasing need of innovative technologies targeted at a more machine-oriented communication. Many keyword-based and statistical approaches have supported information retrieval, data mining, and natural language processing systems, but a deeper understanding of text is still an urgent challenge: concepts, semantic relationships among them, contextual information needed for the concept disambiguation require further progress in the textual-information management. This work introduces a novel technique of extracting the main concepts from the text. Concepts are described by word-based connections disposed in a semantic topological space, built by the formal model, the simplicial complex. It links the points, i.e., the words appearing in the text and incrementally creates a geometrical structure, describing concepts that are more or less specialized, depending on the aggregation distance of words. The conceptual network is context-aware, since it reveals unambiguous concepts, specialized by the analysis of the surrounding text. The framework that implements the approach, discovers basic concepts, composed of minimal number of words useful to describe a finite sense concept, and richer extended concepts built adding further relations among terms. The final topological space provides a multi-granule concept representation: from a local, word-closeness view to a highly refined description. Experiments and comparative analysis validate the effectiveness of the approach, evidencing satisfactory performance in the concept identification, with precision values greater than 80% in the most of the experiments and the recall is on average, around 60–70% with peaks of 90% for some specific concept categories.

Context-aware profiling of concepts from a semantic topological space

CAVALIERE, DANILO;SENATORE, Sabrina;LOIA, Vincenzo
2017-01-01

Abstract

Abstract In the era of Internet of “everything”, the natural language text is still the undiscussed medium of representing information, as evidenced by the pervasiveness of tweets, instant messages, posts, and documents. There is an increasing need of innovative technologies targeted at a more machine-oriented communication. Many keyword-based and statistical approaches have supported information retrieval, data mining, and natural language processing systems, but a deeper understanding of text is still an urgent challenge: concepts, semantic relationships among them, contextual information needed for the concept disambiguation require further progress in the textual-information management. This work introduces a novel technique of extracting the main concepts from the text. Concepts are described by word-based connections disposed in a semantic topological space, built by the formal model, the simplicial complex. It links the points, i.e., the words appearing in the text and incrementally creates a geometrical structure, describing concepts that are more or less specialized, depending on the aggregation distance of words. The conceptual network is context-aware, since it reveals unambiguous concepts, specialized by the analysis of the surrounding text. The framework that implements the approach, discovers basic concepts, composed of minimal number of words useful to describe a finite sense concept, and richer extended concepts built adding further relations among terms. The final topological space provides a multi-granule concept representation: from a local, word-closeness view to a highly refined description. Experiments and comparative analysis validate the effectiveness of the approach, evidencing satisfactory performance in the concept identification, with precision values greater than 80% in the most of the experiments and the recall is on average, around 60–70% with peaks of 90% for some specific concept categories.
2017
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4686575
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 12
social impact