TY - GEN
T1 - Imbalanced malware classification
T2 - 2025 IEEE Symposium on Computational Intelligence in Security, Defence and Biometrics Companion, CISDB Companion 2025
AU - Souza, Jose Vinicius S.
AU - Vieira, Camila Barbosa
AU - Cavalcanti, George D.C.
AU - Cruz, Rafael M.O.
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - In recent years, the rise of cyber threats has emphasized the need for robust malware detection systems, especially on mobile devices. Malware, which targets vulnerabilities in devices and user data, represents a substantial security risk. A significant challenge in malware detection is the imbalance in datasets, where most applications are benign, with only a small fraction posing a threat. This study addresses the often-overlooked issue of class imbalance in malware detection by evaluating various machine learning strategies for detecting malware in Android applications. We assess monolithic classifiers and ensemble methods, focusing on dynamic selection algorithms, which have shown superior performance compared to traditional approaches. In contrast to balancing strategies performed on the whole dataset, we propose a balancing procedure that works individually for each classifier in the pool. Our empirical analysis demonstrates that the KNOP algorithm obtained the best results using a pool of Random Forest. Additionally, an instance hardness assessment revealed that balancing reduces the difficulty of the minority class and enhances the detection of the minority class (malware). The code used for the experiments is available at https://github.com/jvss2/Machine-Learning-Empirical-Evaluation.
AB - In recent years, the rise of cyber threats has emphasized the need for robust malware detection systems, especially on mobile devices. Malware, which targets vulnerabilities in devices and user data, represents a substantial security risk. A significant challenge in malware detection is the imbalance in datasets, where most applications are benign, with only a small fraction posing a threat. This study addresses the often-overlooked issue of class imbalance in malware detection by evaluating various machine learning strategies for detecting malware in Android applications. We assess monolithic classifiers and ensemble methods, focusing on dynamic selection algorithms, which have shown superior performance compared to traditional approaches. In contrast to balancing strategies performed on the whole dataset, we propose a balancing procedure that works individually for each classifier in the pool. Our empirical analysis demonstrates that the KNOP algorithm obtained the best results using a pool of Random Forest. Additionally, an instance hardness assessment revealed that balancing reduces the difficulty of the minority class and enhances the detection of the minority class (malware). The code used for the experiments is available at https://github.com/jvss2/Machine-Learning-Empirical-Evaluation.
KW - Android security
KW - Data Balance
KW - Embedding
KW - Machine Learning
KW - Multiple Classifier Systems
UR - https://www.scopus.com/pages/publications/105010221467
U2 - 10.1109/CISDBCompanion65092.2025.11010722
DO - 10.1109/CISDBCompanion65092.2025.11010722
M3 - Contribution to conference proceedings
AN - SCOPUS:105010221467
T3 - 2025 IEEE Symposium on Computational Intelligence in Security, Defence and Biometrics Companion, CISDB Companion 2025
BT - 2025 IEEE Symposium on Computational Intelligence in Security, Defence and Biometrics Companion, CISDB Companion 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 17 March 2025 through 20 March 2025
ER -