TY - GEN
T1 - GHAminer
T2 - 32nd IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025
AU - Khelifi, Jasem
AU - Benzina, Yacine
AU - Chouchen, Moataz
AU - Ouni, Ali
AU - Sayagh, Mohammed
AU - Bouktif, Salah
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - GitHub Actions (GHA) has become among the most popular Continuous Integration (CI) platforms in open-source software (OSS) and commercial projects. Collecting such build data remains crucial for practitioners and researchers to allow build performance monitoring, optimization and improvement. However, mining GHA builds to collect build-related data and metrics remains challenging and time-consuming. This paper introduces GHAminer, an open-source tool designed to collect build-related metrics for GitHub Actions. GHAminer covers various aspects of data such as the build-related code changes and tests, the build duration and status (e.g., passed, failed, timeout, etc.), and repository metadata, which would be useful for practitioners and researchers to make data-driven decisions to enhance CI efficiency and quality. The tool has a modular architecture that supports efficient data extraction with minimal API load. Specifically, it consists of a set of modules that are related to repository information collection, build analysis, commit history analysis, and build log parsing. We evaluate the performance of GHAminer on a representative sample of 3,151 OSS projects. Results show that GHAminer is efficient in handling projects of various sizes with relatively stable performance to collect build data for larger projects. GHAminer is publicly available with a demo video at: https:lIgithub.com/stilab-ets/GHAminer
AB - GitHub Actions (GHA) has become among the most popular Continuous Integration (CI) platforms in open-source software (OSS) and commercial projects. Collecting such build data remains crucial for practitioners and researchers to allow build performance monitoring, optimization and improvement. However, mining GHA builds to collect build-related data and metrics remains challenging and time-consuming. This paper introduces GHAminer, an open-source tool designed to collect build-related metrics for GitHub Actions. GHAminer covers various aspects of data such as the build-related code changes and tests, the build duration and status (e.g., passed, failed, timeout, etc.), and repository metadata, which would be useful for practitioners and researchers to make data-driven decisions to enhance CI efficiency and quality. The tool has a modular architecture that supports efficient data extraction with minimal API load. Specifically, it consists of a set of modules that are related to repository information collection, build analysis, commit history analysis, and build log parsing. We evaluate the performance of GHAminer on a representative sample of 3,151 OSS projects. Results show that GHAminer is efficient in handling projects of various sizes with relatively stable performance to collect build data for larger projects. GHAminer is publicly available with a demo video at: https:lIgithub.com/stilab-ets/GHAminer
KW - GitHub Actions
KW - continuous integration
KW - open source tools
KW - software build
KW - software mining
KW - software quality
UR - https://www.scopus.com/pages/publications/105007302520
U2 - 10.1109/SANER64311.2025.00087
DO - 10.1109/SANER64311.2025.00087
M3 - Contribution to conference proceedings
AN - SCOPUS:105007302520
T3 - Proceedings - 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025
SP - 834
EP - 838
BT - Proceedings - 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 March 2025 through 7 March 2025
ER -