TY - GEN
T1 - Integration of Modern HPC Performance Tools in Vlasiator for Exascale Analysis and Optimization
AU - Coti, Camille
AU - Pfau-Kempf, Yann
AU - Battarbee, Markus
AU - Ganse, Urs
AU - Shende, Sameer
AU - Huck, Kevin
AU - Rodriquez, Jordi
AU - Kotipalo, Leo
AU - Faj, Jennifer
AU - Williams, Jeremy J.
AU - Peng, Ivy
AU - Malony, Allen D.
AU - Markidis, Stefano
AU - Palmroth, Minna
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Key to the success of developing high-performance applications for present and future heterogeneous supercomputers will be the systematic use of measurement and analysis to understand factors that affect delivered performance in the context of parallelization strategy, heterogeneous programming methodology, data partitioning, and scalable algorithm design. The evolving complexity of future exascale platforms makes it unrealistic for application teams to implement their own tools. Similarly, it is naïve to expect available robust performance tools to work effectively out-of-the-box, without integration and specialization in respect to application-specific requirements and knowledge. Vlasiator is a powerful massively parallel code for accurate magnetospheric and solar wind plasma simulations. It is being ported to the LUMI HPC system for advanced modeling of the Earth's magnetosphere and surrounding solar wind. Building on a preexisting Vlasiator performance API called Phiprof, our work significantly advances the performance measurement and analysis capabilities offered to Vlasiator using the TAU, APEX, and IPM tools. The results presented show in-depth characterization of node-level CPU/GPU and MPI communications performance. We highlight the integration of high-level Phiprof events with detailed performance data to expose opportunities for performance tuning. Our results provide important insights to optimize Vlasiator for the upcoming Exascale machines.
AB - Key to the success of developing high-performance applications for present and future heterogeneous supercomputers will be the systematic use of measurement and analysis to understand factors that affect delivered performance in the context of parallelization strategy, heterogeneous programming methodology, data partitioning, and scalable algorithm design. The evolving complexity of future exascale platforms makes it unrealistic for application teams to implement their own tools. Similarly, it is naïve to expect available robust performance tools to work effectively out-of-the-box, without integration and specialization in respect to application-specific requirements and knowledge. Vlasiator is a powerful massively parallel code for accurate magnetospheric and solar wind plasma simulations. It is being ported to the LUMI HPC system for advanced modeling of the Earth's magnetosphere and surrounding solar wind. Building on a preexisting Vlasiator performance API called Phiprof, our work significantly advances the performance measurement and analysis capabilities offered to Vlasiator using the TAU, APEX, and IPM tools. The results presented show in-depth characterization of node-level CPU/GPU and MPI communications performance. We highlight the integration of high-level Phiprof events with detailed performance data to expose opportunities for performance tuning. Our results provide important insights to optimize Vlasiator for the upcoming Exascale machines.
UR - https://www.scopus.com/pages/publications/85200729957
U2 - 10.1109/IPDPSW63119.2024.00170
DO - 10.1109/IPDPSW63119.2024.00170
M3 - Contribution to conference proceedings
AN - SCOPUS:85200729957
T3 - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
SP - 996
EP - 1005
BT - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
Y2 - 27 May 2024 through 31 May 2024
ER -