Textual Analysis and Software Quality: Challenges and Opportunities

Bavota, G.; De Lucia, Andrea; Oliveto, R.; Palomba, Fabio; Panichella, Annibale

Source code lexicon (identifier names and comments) has been used – as an alternative or as acomplement to source code structure – to perform various kinds of analyses (e.g., traceability recovery). All these successful applications increased in the recent years the interest in using textual analysis for improving and assessing the quality of a software system. In particular, textual analysis could be used to identify refactoring opportunities or ambiguous identifiers that may increase the program comprehension burden by creating a mismatch between the developers' cognitive model and the intended meaning of the term, thus ultimately increasing the risk of fault proneness. In addition, when used “on-line” during software development, textual analysis could guide the programmers to select better identifiers aiming at improving the quality of the source code lexicon. In this paper, we overview research in text analysis for the assessment and the improvement of software quality and discuss our achievements to date, the challenges, and the opportunities for the future.