Information Retrieval (IR) techniques have gained wide-spread acceptance as a method for automating traceability recovery. These techniques recover links between software artifacts based on their textual similarity, i.e., the higher the similarity, the higher the likelihood that there is a link between the two artifacts. A common problem with all IR-based techniques is filtering out noise from the list of candidate links, in order to improve the recovery accuracy. Indeed, software artifacts may be related in many ways and the textual information captures only one aspect of their relationships. In this paper we propose to leverage code ownership information to capture relationships between source code artifacts for improving the recovery of traceability links between documentation and source code. Specifically, we extract the author of each source code component and for each author we identify the “context” she worked on. Thus, for a given query from the external documentation we compute the similarity between it and the context of the authors. When retrieving classes that relate to a specific query using a standard IR-based approach we reward all the classes developed by the authors having their context most similar to the query, by boosting their similarity to the query. The proposed approach, named TYRION (TraceabilitY link Recovery using Information retrieval and code OwNership), has been instantiated for the recovery of traceability links between use cases and Java classes of two software systems. The results indicate that code ownership information can be used to improve the accuracy of an IR-based traceability link recovery technique.

Using Code Ownership to Improve IR-based Traceability Link Recovery

DE LUCIA, Andrea
2013

Abstract

Information Retrieval (IR) techniques have gained wide-spread acceptance as a method for automating traceability recovery. These techniques recover links between software artifacts based on their textual similarity, i.e., the higher the similarity, the higher the likelihood that there is a link between the two artifacts. A common problem with all IR-based techniques is filtering out noise from the list of candidate links, in order to improve the recovery accuracy. Indeed, software artifacts may be related in many ways and the textual information captures only one aspect of their relationships. In this paper we propose to leverage code ownership information to capture relationships between source code artifacts for improving the recovery of traceability links between documentation and source code. Specifically, we extract the author of each source code component and for each author we identify the “context” she worked on. Thus, for a given query from the external documentation we compute the similarity between it and the context of the authors. When retrieving classes that relate to a specific query using a standard IR-based approach we reward all the classes developed by the authors having their context most similar to the query, by boosting their similarity to the query. The proposed approach, named TYRION (TraceabilitY link Recovery using Information retrieval and code OwNership), has been instantiated for the recovery of traceability links between use cases and Java classes of two software systems. The results indicate that code ownership information can be used to improve the accuracy of an IR-based traceability link recovery technique.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11386/4045454
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 25
  • ???jsp.display-item.citation.isi??? ND
social impact