Passer à la navigation principale Passer à la recherche Passer au contenu principal

No silver bullet in software analytics: Understanding the impact of model tuning metrics on the performance of software defects prediction models

  • Concordia University

Résultats de recherche: Contribution à un journalArticle publié dans une revue, révisé par les pairsRevue par des pairs

Résumé

Software analytics leverages machine learning models to extract insights from historical data on software projects. These models come with configurable parameters, known as hyperparameters, which govern their characteristics, such as the number of trees in a random forest. Hyperparameter optimization is crucial for achieving optimal performance in several software engineering problems, such as software defect prediction (SDP). To perform hyperparameter optimization, an appropriate tuning metric should be set to guide the optimal hyperparameter settings. However, the impact of the chosen tuning metric on models’ performance remains unexplored. In this paper, we address this gap by examining the impact of the hyperparameter tuning metric on the performance of software analytics models, using SDP as a case study. First, we start by investigating 105 previously published SDP studies to understand whether researchers report the employed tuning metrics. To further understand the impact of hyper-parameter tuning metrics on model performance, we conduct an empirical study on an SDP dataset comprising 28 releases, by tuning and evaluating 4 widely-used models using 8 tuning metrics and 3 common performance metrics. Our literature review reveals that researchers report the used tuning metric in only 29% of the cases, which poses a threat to the replicability of most of the SDP studies. Our empirical study results unveil several important findings: (i) selecting the appropriate tuning metric can enhance SDP model performance by up to 150% (This has been observed for K-nearest neighbor when evaluated with MCC score), (ii) the tuning metrics can be conflicting, exhibit a high degree of rank models rank inconsistency in 7% of the cases, and (iv) training data attributes like data complexity and classes overlap can form good indicators on the performance the different tuning metrics. Hence, researchers are encouraged to report on their chosen tuning metrics to improve study reproducibility. Practitioners are advised to explore multiple tuning metrics to find the best-performing model. Furthermore, attention to tuning metrics with high-rank inconsistency is recommended since it might lead to a significant performance improvement. Finally, researchers and practitioners are also encouraged to further understand how the different tuning metrics behave.

langue originaleAnglais
Numéro d'article127
journalEmpirical Software Engineering
Volume31
Numéro de publication5
Les DOIs
étatPublié - sept. 2026

Empreinte digitale

Voici les principaux termes ou expressions associés à « No silver bullet in software analytics: Understanding the impact of model tuning metrics on the performance of software defects prediction models ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation