Improving Source Code Lexicon via Traceability and Information Retrieval

De Lucia, Andrea; Massimilano Di Penta,; Oliveto, Rocco

The paper presents an approach helping developers to maintain source code identifiers and comments consistent with high-level artifacts. Specifically, the approach computes and shows the textual similarity between source code and related high-level artifacts. Our conjecture is that developers are induced to improve the source code lexicon, i.e., terms used in identifiers or comments, if the software development environment provides information about the textual similarity between the source code under development and the related high-level artifacts. The proposed approach also recommends candidate identifiers built from high-level artifacts related to the source code under development and has been implemented as an Eclipse plug-in, called COde Comprehension Nurturant Using Traceability (COCONUT). The paper also reports on two controlled experiments performed with master’s and bachelor’s students. The goal of the experiments is to evaluate the quality of identifiers and comments (in terms of their consistency with high-level artifacts) in the source code produced when using or not using COCONUT. The achieved results confirm our conjecture that providing the developers with similarity between code and high-level artifacts helps to improve the quality of source code lexicon. This indicates the potential usefulness of COCONUT as a feature for software development environments.