During software development and evolution, the com- munication among stakeholders is one of the most important activities. Stakeholders communicate to discuss various topics, ranging from low-level concerns (e.g., refactoring) to high-level resolutions (e.g., design rationale). To support such a commu- nication, e-mails are widely used in both commercial and open source software projects. Although several approaches have been proposed to recover links among software artifacts, very few are concerned with e-mails. Recovering links between e-mails and software artifacts discussed in these e-mails is a non trivial task. The main issue is related to the nature of the communication that is scarcely structured and mostly informal. Many of the proposed approaches are based on text search or text retrieval and reformulate the link recovery as a document retrieval problem. We refine and improve such solutions by leveraging the parts of which an e-mail is composed of: header, current message, and previous messages. The relevance of these parts is weighted by a probabilistic approach based on text retrieval. The results of an empirical study conducted on a public benchmark indicate that the new approach in many cases outperforms the baselines: text retrieval and lightweight text search approaches.

A Text Retrieval Approach to Recover Links among E-Mails and Source Code Classes

Giuseppe Scanniello;
2014

Abstract

During software development and evolution, the com- munication among stakeholders is one of the most important activities. Stakeholders communicate to discuss various topics, ranging from low-level concerns (e.g., refactoring) to high-level resolutions (e.g., design rationale). To support such a commu- nication, e-mails are widely used in both commercial and open source software projects. Although several approaches have been proposed to recover links among software artifacts, very few are concerned with e-mails. Recovering links between e-mails and software artifacts discussed in these e-mails is a non trivial task. The main issue is related to the nature of the communication that is scarcely structured and mostly informal. Many of the proposed approaches are based on text search or text retrieval and reformulate the link recovery as a document retrieval problem. We refine and improve such solutions by leveraging the parts of which an e-mail is composed of: header, current message, and previous messages. The relevance of these parts is weighted by a probabilistic approach based on text retrieval. The results of an empirical study conducted on a public benchmark indicate that the new approach in many cases outperforms the baselines: text retrieval and lightweight text search approaches.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11386/4779837
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact