UniSa - IRIS Institutional Research Information System

Approaches to Automatic Text Summarization try to extract key information from one or more input texts and generate summaries whilst preserving content meaning. These strategies are separated into two groups, Extractive and Abstractive, which differ in their work. The extractive summarization extracts sentences from the document text directly, whereas the abstractive summarization creates a summary by interpreting the text and rewriting sentences, often with new words. It is important to assess and confirm how similar a summary is to the original text independently of the particular TS algorithm adopted. The literature proposes various metrics and scores for evaluating text summarization results, and ROUGE (Recall-Oriented Understudy of Gisting Evaluation) is the most used. In this study, our main objective is to evaluate how the ROUGE metric performs when applied to both Extractive and Abstractive summarization algorithms. We aim to understand its effectiveness and reliability as an independent and unbiased metric in assessing the quality of summaries generated by these different approaches. We conducted a first experiment to compare the metric efficiency (ROUGE-1, ROUGE-2 and ROUGE-L) for evaluating Abstractive (word2vec, doc2vec, and glove) versus Extractive Text Summarization algorithms (textRank, lsa, luhn, lexRank), and a second one to compare the obtained score for two different summary approaches: a simple execution of a summarization algorithm versus a multiple execution of different algorithms on the same text. Based on our study, evaluating the ROUGE metric for Abstractive and Extractive algorithms revealed that it reaches similar results for the Abstractive and Extractive algorithms. Moreover, our findings indicate that multiple executions, based on the running of two text summarization algorithms sequentially on the same text, generally outperform single executions of a single text summarization algorithm.

Assessing the effectiveness of ROUGE as unbiased metric in Extractive vs. Abstractive summarization techniques

Auriemma Citarella A.;Barbella M.;Ciobanu M. G.;De Marco F.;Di Biasi L.;Tortora G.

2025

Abstract

Approaches to Automatic Text Summarization try to extract key information from one or more input texts and generate summaries whilst preserving content meaning. These strategies are separated into two groups, Extractive and Abstractive, which differ in their work. The extractive summarization extracts sentences from the document text directly, whereas the abstractive summarization creates a summary by interpreting the text and rewriting sentences, often with new words. It is important to assess and confirm how similar a summary is to the original text independently of the particular TS algorithm adopted. The literature proposes various metrics and scores for evaluating text summarization results, and ROUGE (Recall-Oriented Understudy of Gisting Evaluation) is the most used. In this study, our main objective is to evaluate how the ROUGE metric performs when applied to both Extractive and Abstractive summarization algorithms. We aim to understand its effectiveness and reliability as an independent and unbiased metric in assessing the quality of summaries generated by these different approaches. We conducted a first experiment to compare the metric efficiency (ROUGE-1, ROUGE-2 and ROUGE-L) for evaluating Abstractive (word2vec, doc2vec, and glove) versus Extractive Text Summarization algorithms (textRank, lsa, luhn, lexRank), and a second one to compare the obtained score for two different summary approaches: a simple execution of a summarization algorithm versus a multiple execution of different algorithms on the same text. Based on our study, evaluating the ROUGE metric for Abstractive and Extractive algorithms revealed that it reaches similar results for the Abstractive and Extractive algorithms. Moreover, our findings indicate that multiple executions, based on the running of two text summarization algorithms sequentially on the same text, generally outperform single executions of a single text summarization algorithm.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno di pubblicazione

2025

Appare nelle tipologie:

1.1 Articoli su Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4905576

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

2

social impact