It has been demonstrated that a way to increase the number of relevant documents returned by an informational query performed on a Web repository is to expand the original query with additional knowledge, for instance coded through other topic-related terms. In this paper we propose a new technique to build automatically, through the probabilistic topic model and given a small set of documents on a topic, the expansion of a query based on a mixed Graph of Terms (mGT ) representation composed of two levels: the conceptual level, a set of interconnected terms representing concepts (undirected edges), and the word level composed of the cloud of interconnected words specifying a concept (directed edges). A mGT can be automatically learnt from a small set of documents through two learning stages and thanks to the probabilistic topic model. We have evaluated the performance through a comparison between our searching methodology and a classic one which considers the query expansion formed of only the list of concepts and words composing the graph and so where relations have not been considered. The results obtained show that our system, independently of the topic, is able to retrieve more relevant web pages.

Improving Text Retrieval Accuracy Using a Graph of Terms

Fabio Clarizia;COLACE, Francesco;GRECO, LUCA;DE SANTO, Massimo;
2011-01-01

Abstract

It has been demonstrated that a way to increase the number of relevant documents returned by an informational query performed on a Web repository is to expand the original query with additional knowledge, for instance coded through other topic-related terms. In this paper we propose a new technique to build automatically, through the probabilistic topic model and given a small set of documents on a topic, the expansion of a query based on a mixed Graph of Terms (mGT ) representation composed of two levels: the conceptual level, a set of interconnected terms representing concepts (undirected edges), and the word level composed of the cloud of interconnected words specifying a concept (directed edges). A mGT can be automatically learnt from a small set of documents through two learning stages and thanks to the probabilistic topic model. We have evaluated the performance through a comparison between our searching methodology and a classic one which considers the query expansion formed of only the list of concepts and words composing the graph and so where relations have not been considered. The results obtained show that our system, independently of the topic, is able to retrieve more relevant web pages.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/3036050
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact