Context: Machine Learning (ML) is widely used in critical domains like finance, healthcare, and criminal justice, where unfair predictions can lead to harmful outcomes. Although bias mitigation techniques have been developed by the Software Engineering (SE) community, their practical adoption is limited due to complexity and integration issues. As a simpler alternative, fairness-aware practices, namely conventional ML engineering techniques adapted to promote fairness, e.g., MinMax Scaling, which normalizes feature values to prevent attributes linked to sensitive groups from disproportionately influencing predictions, have recently been proposed, yet their actual impact is still unexplored. Objective: Building on our prior work that explored fairness-aware practices in different contexts, this paper extends the investigation through a large-scale empirical study assessing their effectiveness across diverse ML tasks, sensitive attributes, and datasets belonging to specific application domains. Methods: We conduct 5940 experiments, evaluating fairness-aware practices from two perspectives: contextual bias mitigation and cost-effectiveness. Contextual evaluation examines fairness improvements across different ML models, sensitive attributes, and datasets. Cost-effectiveness analysis considers the trade-off between fairness gains and performance costs. Results: Findings reveal that the effectiveness of fairness-aware practices depends on specific contexts’ datasets and configurations, while cost-effectiveness analysis highlights those that best balance ethical gains and efficiency. Conclusion: These insights guide practitioners in choosing fairness-enhancing practices with minimal performance impact, supporting ethical ML development.
Fairness on a budget, across the board: A cost-effective evaluation of fairness-aware practices across contexts, tasks, and sensitive attributes
Parziale, Alessandra;Voria, Gianmario;Giordano, Giammaria;Catolino, Gemma;Palomba, Fabio
2025
Abstract
Context: Machine Learning (ML) is widely used in critical domains like finance, healthcare, and criminal justice, where unfair predictions can lead to harmful outcomes. Although bias mitigation techniques have been developed by the Software Engineering (SE) community, their practical adoption is limited due to complexity and integration issues. As a simpler alternative, fairness-aware practices, namely conventional ML engineering techniques adapted to promote fairness, e.g., MinMax Scaling, which normalizes feature values to prevent attributes linked to sensitive groups from disproportionately influencing predictions, have recently been proposed, yet their actual impact is still unexplored. Objective: Building on our prior work that explored fairness-aware practices in different contexts, this paper extends the investigation through a large-scale empirical study assessing their effectiveness across diverse ML tasks, sensitive attributes, and datasets belonging to specific application domains. Methods: We conduct 5940 experiments, evaluating fairness-aware practices from two perspectives: contextual bias mitigation and cost-effectiveness. Contextual evaluation examines fairness improvements across different ML models, sensitive attributes, and datasets. Cost-effectiveness analysis considers the trade-off between fairness gains and performance costs. Results: Findings reveal that the effectiveness of fairness-aware practices depends on specific contexts’ datasets and configurations, while cost-effectiveness analysis highlights those that best balance ethical gains and efficiency. Conclusion: These insights guide practitioners in choosing fairness-enhancing practices with minimal performance impact, supporting ethical ML development.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.