DiffGAN: A Test Generation Approach for Differential Testing of Deep Neural Networks for Image Analysis

Résultats de recherche: Contribution à un journalArticle publié dans une revue, révisé par les pairsRevue par des pairs

Résumé

Deep Neural Networks (DNNs) are increasingly deployed across a wide range of applications, from image classification to autonomous driving. However, ensuring their reliability remains a challenge, and in many situations, alternative models with similar functionality and accuracy levels are available. Traditional accuracy-based evaluations often fail to capture behavioral differences between such models, particularly when testing datasets are limited, making it challenging to select or optimally combine models. Differential testing addresses this limitation by generating test inputs that expose discrepancies in the behavior of DNN models. However, existing differential testing approaches face significant limitations: many rely on access to model internals or are constrained by the availability of seed inputs, limiting their generalizability and effectiveness. In response to these challenges, we propose DiffGAN, a black-box test generation approach for differential testing of DNN models. Our approach, though adaptable to other domains, is specific to DNN models for image classification tasks, a highly prevalent application area. Our method relies on a Generative Adversarial Network (GAN) and the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to generate diverse and valid triggering inputs that effectively reveal behavioral discrepancies between models. Our method employs two custom fitness functions, one focused on diversity and the other on divergence, to guide the exploration of the GAN input space and identify discrepancies between the models’ outputs. By strategically searching the GAN input space, we show that DiffGAN can effectively generate inputs with specific features that trigger differences in behavior for the models under test. Unlike traditional white-box methods, DiffGAN does not require access to the internal structure of the models, which makes it applicable to a wider range of situations. We evaluate DiffGAN on a benchmark comprising eight pairs of DNN models trained on two widely used image classification datasets. Our results demonstrate that DiffGAN significantly outperforms a state-of-the-art (SOTA) baseline, generating four times more triggering inputs, with higher diversity and validity, within the same testing budget. Furthermore, we show that the generated input can be used to improve the accuracy of a machine learning-based model selection mechanism, which dynamically selects the best-performing model based on input characteristics and can thus be used as a smart model output voting mechanism when using alternative models together.

langue originaleAnglais
Pages (de - à)3284-3309
Nombre de pages26
journalIEEE Transactions on Software Engineering
Volume51
Numéro de publication12
Les DOIs
étatPublié - 2025

Empreinte digitale

Voici les principaux termes ou expressions associés à « DiffGAN: A Test Generation Approach for Differential Testing of Deep Neural Networks for Image Analysis ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation