The rapid adoption of Machine Learning (ML) technologies has introduced new challenges for code quality. Code smells, i.e., suboptimal design and implementation choices applied when developing source code, represent a particularly prevalent problem. While software engineering (SE) practices are often recommended to improve maintainability, their actual impact on code smells in ML projects remains unclear. In this paper, we present an evidence-based empirical study of 566 real-world Python ML projects from the NICHE dataset, labeled according to adherence to eight established SE practices. Using static analysis and statistical testing, we assess the relationship between these practices and the presence of ten Python-specific code smells. Our results show that projects adopting SE practices exhibit significantly fewer code smells. In particular, Continuous Integration is negatively correlated with the Complex Container Comprehension smell. These findings highlight the importance of engineering discipline in managing code quality in ML development.
An Evidence-Based Study on the Relationship of Software Engineering Practices on Code Smells in Python ML Projects
	
	
	
		
		
		
		
		
	
	
	
	
	
	
	
	
		
		
		
		
		
			
			
			
		
		
		
		
			
			
				
				
					
					
					
					
						
							
						
						
					
				
				
				
				
				
				
				
				
				
				
				
			
			
		
			
			
				
				
					
					
					
					
						
							
						
						
					
				
				
				
				
				
				
				
				
				
				
				
			
			
		
			
			
				
				
					
					
					
					
						
							
						
						
					
				
				
				
				
				
				
				
				
				
				
				
			
			
		
			
			
				
				
					
					
					
					
						
							
						
						
					
				
				
				
				
				
				
				
				
				
				
				
			
			
		
		
		
		
	
Giordano, Giammaria
;Della Porta, Antonio;Ferrucci, Filomena;Palomba, Fabio
			2026
Abstract
The rapid adoption of Machine Learning (ML) technologies has introduced new challenges for code quality. Code smells, i.e., suboptimal design and implementation choices applied when developing source code, represent a particularly prevalent problem. While software engineering (SE) practices are often recommended to improve maintainability, their actual impact on code smells in ML projects remains unclear. In this paper, we present an evidence-based empirical study of 566 real-world Python ML projects from the NICHE dataset, labeled according to adherence to eight established SE practices. Using static analysis and statistical testing, we assess the relationship between these practices and the presence of ten Python-specific code smells. Our results show that projects adopting SE practices exhibit significantly fewer code smells. In particular, Continuous Integration is negatively correlated with the Complex Container Comprehension smell. These findings highlight the importance of engineering discipline in managing code quality in ML development.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


