Skip to main navigation Skip to search Skip to main content

No silver bullet in software analytics: Understanding the impact of model tuning metrics on the performance of software defects prediction models

  • Concordia University

Research output: Contribution to journalJournal Articlepeer-review

Abstract

Software analytics leverages machine learning models to extract insights from historical data on software projects. These models come with configurable parameters, known as hyperparameters, which govern their characteristics, such as the number of trees in a random forest. Hyperparameter optimization is crucial for achieving optimal performance in several software engineering problems, such as software defect prediction (SDP). To perform hyperparameter optimization, an appropriate tuning metric should be set to guide the optimal hyperparameter settings. However, the impact of the chosen tuning metric on models’ performance remains unexplored. In this paper, we address this gap by examining the impact of the hyperparameter tuning metric on the performance of software analytics models, using SDP as a case study. First, we start by investigating 105 previously published SDP studies to understand whether researchers report the employed tuning metrics. To further understand the impact of hyper-parameter tuning metrics on model performance, we conduct an empirical study on an SDP dataset comprising 28 releases, by tuning and evaluating 4 widely-used models using 8 tuning metrics and 3 common performance metrics. Our literature review reveals that researchers report the used tuning metric in only 29% of the cases, which poses a threat to the replicability of most of the SDP studies. Our empirical study results unveil several important findings: (i) selecting the appropriate tuning metric can enhance SDP model performance by up to 150% (This has been observed for K-nearest neighbor when evaluated with MCC score), (ii) the tuning metrics can be conflicting, exhibit a high degree of rank models rank inconsistency in 7% of the cases, and (iv) training data attributes like data complexity and classes overlap can form good indicators on the performance the different tuning metrics. Hence, researchers are encouraged to report on their chosen tuning metrics to improve study reproducibility. Practitioners are advised to explore multiple tuning metrics to find the best-performing model. Furthermore, attention to tuning metrics with high-rank inconsistency is recommended since it might lead to a significant performance improvement. Finally, researchers and practitioners are also encouraged to further understand how the different tuning metrics behave.

Original languageEnglish
Article number127
JournalEmpirical Software Engineering
Volume31
Issue number5
DOIs
Publication statusPublished - Sept 2026

!!!Keywords

  • Empirical software engineering
  • Hyper-parameters tuning
  • Machine learning
  • Software analytics
  • Software defect prediction

Fingerprint

Dive into the research topics of 'No silver bullet in software analytics: Understanding the impact of model tuning metrics on the performance of software defects prediction models'. These topics are generated from the title and abstract of the publication. Together, they form a unique fingerprint.

Cite this