The recent Web panorama reveals a tangible proliferation of “social” data, in form of posts, opinions, feelings, experiences. Most of the available data is unstructured text, unsuitable to be processed by computers, especially due to ambiguity and vagueness of the natural language. Research developments highlight the difficulty in capturing semantics of terms, linguistic expressions, and sentences and their consequent representation as a finite concept. This article presents an open-minded overview of the Text Mining approaches, targeted at transforming unstructured textual data into explicit knowledge, with a special focus on the conceptualization, i.e., the concept identification by analysing syntactic and semantic relations among terms as well as the contextual surrounding information. Different knowledge granulation is described in a layered knowledge model, where the term, the information and the concept represent the basic knowledge granules that cover most Text Mining approaches, in an evolving knowledge continuum.
Data-Information-Concept Continuum From a Text Mining Perspective
CAVALIERE, DANILO;Senatore, Sabrina;Loia, Vincenzo
2018-01-01
Abstract
The recent Web panorama reveals a tangible proliferation of “social” data, in form of posts, opinions, feelings, experiences. Most of the available data is unstructured text, unsuitable to be processed by computers, especially due to ambiguity and vagueness of the natural language. Research developments highlight the difficulty in capturing semantics of terms, linguistic expressions, and sentences and their consequent representation as a finite concept. This article presents an open-minded overview of the Text Mining approaches, targeted at transforming unstructured textual data into explicit knowledge, with a special focus on the conceptualization, i.e., the concept identification by analysing syntactic and semantic relations among terms as well as the contextual surrounding information. Different knowledge granulation is described in a layered knowledge model, where the term, the information and the concept represent the basic knowledge granules that cover most Text Mining approaches, in an evolving knowledge continuum.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.