Summarization techniques are becoming an essential part of everyday life, basically because summaries allow users to spend less time making effective access to the desired information. In this paper, we present a general framework for retrieving relevant information from news articles and a novel summarization algorithm based on a deep semantic analysis of texts. In particular, a set of triples (subject, predicate, object) is extracted from each document and it is then used to build a summary through an unsupervised clustering algorithm exploiting the notion of semantic similarity. Finally, we leverage the centroids of clusters to determine the most significant summary sentences using some heuristics. Several experiments are carried out using the standard DUC methodology and ROUGE software and show how the proposed method outperforms several summarizer systems in terms of recall and readability.
Semantic summarization of news from heterogeneous sources
d'Acierno A.;Colace F.;
2017-01-01
Abstract
Summarization techniques are becoming an essential part of everyday life, basically because summaries allow users to spend less time making effective access to the desired information. In this paper, we present a general framework for retrieving relevant information from news articles and a novel summarization algorithm based on a deep semantic analysis of texts. In particular, a set of triples (subject, predicate, object) is extracted from each document and it is then used to build a summary through an unsupervised clustering algorithm exploiting the notion of semantic similarity. Finally, we leverage the centroids of clusters to determine the most significant summary sentences using some heuristics. Several experiments are carried out using the standard DUC methodology and ROUGE software and show how the proposed method outperforms several summarizer systems in terms of recall and readability.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.