Untangling GPU Power Consumption: Job-Level Inference in Cloud Shared Settings

Pierre Jacquet; Maxime Agusti; Eddy Caron; Camille Coti; Marcos Dias De Assunção; Laurent Lefèvre; Anne Cécile Orgerie

doi:10.1145/3767295.3769333

Untangling GPU Power Consumption: Job-Level Inference in Cloud Shared Settings

Pierre Jacquet
, Maxime Agusti
, Eddy Caron
, Camille Coti
, Marcos Dias De Assunção
, Laurent Lefèvre
, Anne Cécile Orgerie

École de technologie supérieure
Universite Claude Bernard Lyon 1
University of Rennes 1

Résultats de recherche: Chapitre dans un livre, rapport, actes de conférence › Participation à un ouvrage collectif lié à un colloque ou une conférence › Revue par des pairs

Résumé

As the demand for AI-driven workloads increases, the energy consumption of Graphics Processing Units (GPUs) devices has come under intense scrutiny, particularly in hyperscale data centers where large numbers of accelerators are centralized and leased to diverse clients. In the context of cloud hyperscalers, GPUs power monitoring presents several challenges that vary depending on the product offered. The monitoring capabilities of physical devices may be limited or even absent for some products. However, given the substantial energy demands of GPUs, power monitoring is essential for both cloud providers and clients. Operators require tools to manage power distribution effectively, such as balancing workloads across Power Distribution Units (PDUs), while clients need visibility into power usage to optimize their workloads for energy efficiency. To address these challenges, we propose methods for estimating the energy consumption of jobs running on GPU devices in cloud environments, spanning from shared and managed offerings like ML-as-a-Service (MLaaS) to less managed products (e.g., Infrastructure-as-a-Service (IaaS)). Our models demonstrate the benefits of sharing GPUs for small AI workloads, as well as the current sub-optimal utilization of GPUs in cloud hyperscalers, based on insights from an IaaS GPU cluster.

langue originale	Anglais
titre	EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems
Editeur	Association for Computing Machinery, Inc
Pages	624-640
Nombre de pages	17
ISBN (Electronique)	9798400722127
Les DOIs	https://doi.org/10.1145/3767295.3769333
état	Publié - 26 avr. 2026
Evénement	2026 European Conference on Computer Systems, EUROSYS 2026 - Edinburgh, Royaume-Uni Durée: 27 avr. 2026 → 30 avr. 2026

Série de publications

Nom	EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems

Conférence

Conférence	2026 European Conference on Computer Systems, EUROSYS 2026
Pays/Territoire	Royaume-Uni
La ville	Edinburgh
période	27/04/26 → 30/04/26

Accès au document

10.1145/3767295.3769333

Autres fichiers et liens

Lien vers la publication dans Scopus

Empreinte digitale

Voici les principaux termes ou expressions associés à « Untangling GPU Power Consumption: Job-Level Inference in Cloud Shared Settings ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation

Jacquet, P., Agusti, M., Caron, E., Coti, C., De Assunção, M. D., Lefèvre, L., & Orgerie, A. C. (2026). Untangling GPU Power Consumption: Job-Level Inference in Cloud Shared Settings. Dans EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems (p. 624-640). (EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems). Association for Computing Machinery, Inc. https://doi.org/10.1145/3767295.3769333

@inproceedings{7ce48326929a4e2984ef551e1bf12bc4,

title = "Untangling GPU Power Consumption: Job-Level Inference in Cloud Shared Settings",

abstract = "As the demand for AI-driven workloads increases, the energy consumption of Graphics Processing Units (GPUs) devices has come under intense scrutiny, particularly in hyperscale data centers where large numbers of accelerators are centralized and leased to diverse clients. In the context of cloud hyperscalers, GPUs power monitoring presents several challenges that vary depending on the product offered. The monitoring capabilities of physical devices may be limited or even absent for some products. However, given the substantial energy demands of GPUs, power monitoring is essential for both cloud providers and clients. Operators require tools to manage power distribution effectively, such as balancing workloads across Power Distribution Units (PDUs), while clients need visibility into power usage to optimize their workloads for energy efficiency. To address these challenges, we propose methods for estimating the energy consumption of jobs running on GPU devices in cloud environments, spanning from shared and managed offerings like ML-as-a-Service (MLaaS) to less managed products (e.g., Infrastructure-as-a-Service (IaaS)). Our models demonstrate the benefits of sharing GPUs for small AI workloads, as well as the current sub-optimal utilization of GPUs in cloud hyperscalers, based on insights from an IaaS GPU cluster.",

keywords = "Cloud computing, GPU, Power consumption",

author = "Pierre Jacquet and Maxime Agusti and Eddy Caron and Camille Coti and \{De Assun{\c c}{\~a}o\}, \{Marcos Dias\} and Laurent Lef{\`e}vre and Orgerie, \{Anne C{\'e}cile\}",

note = "Publisher Copyright: {\textcopyright} 2026 Copyright held by the owner/author(s); 2026 European Conference on Computer Systems, EUROSYS 2026 ; Conference date: 27-04-2026 Through 30-04-2026",

year = "2026",

month = apr,

day = "26",

doi = "10.1145/3767295.3769333",

language = "English",

series = "EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems",

publisher = "Association for Computing Machinery, Inc",

pages = "624--640",

booktitle = "EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems",

}

Jacquet, P, Agusti, M, Caron, E, Coti, C , De Assunção, MD, Lefèvre, L & Orgerie, AC 2026, Untangling GPU Power Consumption: Job-Level Inference in Cloud Shared Settings. Dans EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems. EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems, Association for Computing Machinery, Inc, p. 624-640, 2026 European Conference on Computer Systems, EUROSYS 2026, Edinburgh, Royaume-Uni, 27/04/26. https://doi.org/10.1145/3767295.3769333

Untangling GPU Power Consumption: Job-Level Inference in Cloud Shared Settings. / Jacquet, Pierre; Agusti, Maxime; Caron, Eddy et al.
EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems. Association for Computing Machinery, Inc, 2026. p. 624-640 (EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems).

Résultats de recherche: Chapitre dans un livre, rapport, actes de conférence › Participation à un ouvrage collectif lié à un colloque ou une conférence › Revue par des pairs

TY - GEN

T1 - Untangling GPU Power Consumption

T2 - 2026 European Conference on Computer Systems, EUROSYS 2026

AU - Jacquet, Pierre

AU - Agusti, Maxime

AU - Caron, Eddy

AU - Coti, Camille

AU - De Assunção, Marcos Dias

AU - Lefèvre, Laurent

AU - Orgerie, Anne Cécile

PY - 2026/4/26

Y1 - 2026/4/26

N2 - As the demand for AI-driven workloads increases, the energy consumption of Graphics Processing Units (GPUs) devices has come under intense scrutiny, particularly in hyperscale data centers where large numbers of accelerators are centralized and leased to diverse clients. In the context of cloud hyperscalers, GPUs power monitoring presents several challenges that vary depending on the product offered. The monitoring capabilities of physical devices may be limited or even absent for some products. However, given the substantial energy demands of GPUs, power monitoring is essential for both cloud providers and clients. Operators require tools to manage power distribution effectively, such as balancing workloads across Power Distribution Units (PDUs), while clients need visibility into power usage to optimize their workloads for energy efficiency. To address these challenges, we propose methods for estimating the energy consumption of jobs running on GPU devices in cloud environments, spanning from shared and managed offerings like ML-as-a-Service (MLaaS) to less managed products (e.g., Infrastructure-as-a-Service (IaaS)). Our models demonstrate the benefits of sharing GPUs for small AI workloads, as well as the current sub-optimal utilization of GPUs in cloud hyperscalers, based on insights from an IaaS GPU cluster.

AB - As the demand for AI-driven workloads increases, the energy consumption of Graphics Processing Units (GPUs) devices has come under intense scrutiny, particularly in hyperscale data centers where large numbers of accelerators are centralized and leased to diverse clients. In the context of cloud hyperscalers, GPUs power monitoring presents several challenges that vary depending on the product offered. The monitoring capabilities of physical devices may be limited or even absent for some products. However, given the substantial energy demands of GPUs, power monitoring is essential for both cloud providers and clients. Operators require tools to manage power distribution effectively, such as balancing workloads across Power Distribution Units (PDUs), while clients need visibility into power usage to optimize their workloads for energy efficiency. To address these challenges, we propose methods for estimating the energy consumption of jobs running on GPU devices in cloud environments, spanning from shared and managed offerings like ML-as-a-Service (MLaaS) to less managed products (e.g., Infrastructure-as-a-Service (IaaS)). Our models demonstrate the benefits of sharing GPUs for small AI workloads, as well as the current sub-optimal utilization of GPUs in cloud hyperscalers, based on insights from an IaaS GPU cluster.

KW - Cloud computing

KW - GPU

KW - Power consumption

UR - https://www.scopus.com/pages/publications/105038436791

U2 - 10.1145/3767295.3769333

DO - 10.1145/3767295.3769333

M3 - Contribution to conference proceedings

AN - SCOPUS:105038436791

T3 - EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems

SP - 624

EP - 640

BT - EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems

PB - Association for Computing Machinery, Inc

Y2 - 27 April 2026 through 30 April 2026

ER -

Jacquet P, Agusti M, Caron E, Coti C , De Assunção MD, Lefèvre L et al. Untangling GPU Power Consumption: Job-Level Inference in Cloud Shared Settings. Dans EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems. Association for Computing Machinery, Inc. 2026. p. 624-640. (EUROSYS 2026 - Proceedings of the 2026 European Conference on Computer Systems). doi: 10.1145/3767295.3769333