In the era of Web 2.0, the knowledge is the de-facto social currency in the global network environment. Knowledge is not an accumulation of data, but a relation-based representation of the information content, which needs to be distilled and arranged in a semantic infrastructure to guarantee interoperability and sharable understanding. In the light of this scenario, the paper introduces a semantically enhanced document retrieval system that describes each retrieved document with an ontological multi-grained network of the extracted conceptualization. The system is based on two well-known latent models: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA): LSA provides a spatial distribution of the input documents, facilitating their retrieval, thanks to an ontological representation of their relationship network. LDA works instead at deeper level: it drives the ontological structuring of the knowledge inside the individual retrieved documents in terms of words, concepts and topics. The novelty of this approach is a multi-level granulation of the knowledge: from a document matching the query (coarse granularity), to the topics that join documents, until to the words describing a concept into a topic (fine granularity). The final result is a SKOS-based ontology, ad-hoc created for a document corpus; graphically supported for the navigation, it enables the exploration of the concepts at different granularity levels.

A semantic-grained perspective of latent knowledge modeling

SENATORE, Sabrina;LOIA, Vincenzo
2017-01-01

Abstract

In the era of Web 2.0, the knowledge is the de-facto social currency in the global network environment. Knowledge is not an accumulation of data, but a relation-based representation of the information content, which needs to be distilled and arranged in a semantic infrastructure to guarantee interoperability and sharable understanding. In the light of this scenario, the paper introduces a semantically enhanced document retrieval system that describes each retrieved document with an ontological multi-grained network of the extracted conceptualization. The system is based on two well-known latent models: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA): LSA provides a spatial distribution of the input documents, facilitating their retrieval, thanks to an ontological representation of their relationship network. LDA works instead at deeper level: it drives the ontological structuring of the knowledge inside the individual retrieved documents in terms of words, concepts and topics. The novelty of this approach is a multi-level granulation of the knowledge: from a document matching the query (coarse granularity), to the topics that join documents, until to the words describing a concept into a topic (fine granularity). The final result is a SKOS-based ontology, ad-hoc created for a document corpus; graphically supported for the navigation, it enables the exploration of the concepts at different granularity levels.
2017
File in questo prodotto:
File Dimensione Formato  
mainPaper.pdf

Open Access dal 03/11/2018

Descrizione: versione accettata
Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Creative commons
Dimensione 2.77 MB
Formato Adobe PDF
2.77 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4677755
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 17
  • ???jsp.display-item.citation.isi??? 13
social impact