Transformer Meets Gated Residual Networks to Enhance PICU’s PPG Artifact Detection Informed by Mutual Information Neural Estimation

Thanh Dung Le; Clara Macabiau; Kevin Albert; Symeon Chatzinotas; Philippe Jouvet; Rita Noumeir

doi:10.1109/TNNLS.2026.3656756

Transformer Meets Gated Residual Networks to Enhance PICU’s PPG Artifact Detection Informed by Mutual Information Neural Estimation

Thanh Dung Le
, Clara Macabiau
, Kevin Albert
, Symeon Chatzinotas
, Philippe Jouvet
, Rita Noumeir

Résultats de recherche: Contribution à un journal › Article publié dans une revue, révisé par les pairs › Revue par des pairs

Résumé

This study delves into the effectiveness of various learning methods in improving Transformer models, focusing mainly on the Gated Residual Network (GRN) Transformer in the context of pediatric intensive care units (PICUs) with limited data availability. Our findings indicate that Transformers trained via supervised learning are less effective than MLP, CNN, and LSTM networks in such environments. Yet, leveraging unsupervised and self-supervised learning (SSL) on unannotated data, with subsequent fine-tuning on annotated data, notably enhances Transformer performance, although not to the level of the GRN–Transformer. Central to our research is analyzing different activation functions for the gated linear unit (GLU), a crucial element of the GRN structure. We also employ Mutual Information Neural Estimation (MINE) to evaluate the GRN’s contribution. Additionally, the study examines the effects of integrating GRN within the Transformer’s attention mechanism versus using it as a separate intermediary layer. Our results highlight that GLU with sigmoid activation stands out, achieving 0.98 accuracy, 0.91 precision, 0.96 recall, and 0.94~F1 -score. The MINE analysis supports the hypothesis that GRN enhances the mutual information (MI) between the hidden representations and the output. Moreover, using GRN as an intermediate filter layer proves more beneficial than incorporating it within the Attention mechanism. This study clarifies how GRN boosters GRN–Transformer’s performance surpasses other techniques. These findings offer a promising avenue for adopting sophisticated models like Transformers in data-constrained environments, such as PPG artifact detection in PICU settings.

langue originale	Anglais
journal	IEEE Transactions on Neural Networks and Learning Systems
Les DOIs	https://doi.org/10.1109/TNNLS.2026.3656756
état	Accepté/Sous presse - 2026
Modification externe	Oui

Accès au document

10.1109/TNNLS.2026.3656756

Autres fichiers et liens

Lien vers la publication dans Scopus

Empreinte digitale

Voici les principaux termes ou expressions associés à « Transformer Meets Gated Residual Networks to Enhance PICU’s PPG Artifact Detection Informed by Mutual Information Neural Estimation ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation

@article{debadc7e6835402199400c560b85b6f5,

title = "Transformer Meets Gated Residual Networks to Enhance PICU{\textquoteright}s PPG Artifact Detection Informed by Mutual Information Neural Estimation",

abstract = "This study delves into the effectiveness of various learning methods in improving Transformer models, focusing mainly on the Gated Residual Network (GRN) Transformer in the context of pediatric intensive care units (PICUs) with limited data availability. Our findings indicate that Transformers trained via supervised learning are less effective than MLP, CNN, and LSTM networks in such environments. Yet, leveraging unsupervised and self-supervised learning (SSL) on unannotated data, with subsequent fine-tuning on annotated data, notably enhances Transformer performance, although not to the level of the GRN–Transformer. Central to our research is analyzing different activation functions for the gated linear unit (GLU), a crucial element of the GRN structure. We also employ Mutual Information Neural Estimation (MINE) to evaluate the GRN{\textquoteright}s contribution. Additionally, the study examines the effects of integrating GRN within the Transformer{\textquoteright}s attention mechanism versus using it as a separate intermediary layer. Our results highlight that GLU with sigmoid activation stands out, achieving 0.98 accuracy, 0.91 precision, 0.96 recall, and 0.94\textasciitilde{}F1 -score. The MINE analysis supports the hypothesis that GRN enhances the mutual information (MI) between the hidden representations and the output. Moreover, using GRN as an intermediate filter layer proves more beneficial than incorporating it within the Attention mechanism. This study clarifies how GRN boosters GRN–Transformer{\textquoteright}s performance surpasses other techniques. These findings offer a promising avenue for adopting sophisticated models like Transformers in data-constrained environments, such as PPG artifact detection in PICU settings.",

keywords = "Clinical PPG signals, Gated Residual Networks (GRNs), imbalanced classes, mutual information (MI), transformers",

author = "Le, \{Thanh Dung\} and Clara Macabiau and Kevin Albert and Symeon Chatzinotas and Philippe Jouvet and Rita Noumeir",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2026",

doi = "10.1109/TNNLS.2026.3656756",

language = "English",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

}

Transformer Meets Gated Residual Networks to Enhance PICU’s PPG Artifact Detection Informed by Mutual Information Neural Estimation. / Le, Thanh Dung; Macabiau, Clara; Albert, Kevin et al.
Dans: IEEE Transactions on Neural Networks and Learning Systems, 2026.

Résultats de recherche: Contribution à un journal › Article publié dans une revue, révisé par les pairs › Revue par des pairs

TY - JOUR

T1 - Transformer Meets Gated Residual Networks to Enhance PICU’s PPG Artifact Detection Informed by Mutual Information Neural Estimation

AU - Le, Thanh Dung

AU - Macabiau, Clara

AU - Albert, Kevin

AU - Chatzinotas, Symeon

AU - Jouvet, Philippe

AU - Noumeir, Rita

PY - 2026

Y1 - 2026

N2 - This study delves into the effectiveness of various learning methods in improving Transformer models, focusing mainly on the Gated Residual Network (GRN) Transformer in the context of pediatric intensive care units (PICUs) with limited data availability. Our findings indicate that Transformers trained via supervised learning are less effective than MLP, CNN, and LSTM networks in such environments. Yet, leveraging unsupervised and self-supervised learning (SSL) on unannotated data, with subsequent fine-tuning on annotated data, notably enhances Transformer performance, although not to the level of the GRN–Transformer. Central to our research is analyzing different activation functions for the gated linear unit (GLU), a crucial element of the GRN structure. We also employ Mutual Information Neural Estimation (MINE) to evaluate the GRN’s contribution. Additionally, the study examines the effects of integrating GRN within the Transformer’s attention mechanism versus using it as a separate intermediary layer. Our results highlight that GLU with sigmoid activation stands out, achieving 0.98 accuracy, 0.91 precision, 0.96 recall, and 0.94~F1 -score. The MINE analysis supports the hypothesis that GRN enhances the mutual information (MI) between the hidden representations and the output. Moreover, using GRN as an intermediate filter layer proves more beneficial than incorporating it within the Attention mechanism. This study clarifies how GRN boosters GRN–Transformer’s performance surpasses other techniques. These findings offer a promising avenue for adopting sophisticated models like Transformers in data-constrained environments, such as PPG artifact detection in PICU settings.

AB - This study delves into the effectiveness of various learning methods in improving Transformer models, focusing mainly on the Gated Residual Network (GRN) Transformer in the context of pediatric intensive care units (PICUs) with limited data availability. Our findings indicate that Transformers trained via supervised learning are less effective than MLP, CNN, and LSTM networks in such environments. Yet, leveraging unsupervised and self-supervised learning (SSL) on unannotated data, with subsequent fine-tuning on annotated data, notably enhances Transformer performance, although not to the level of the GRN–Transformer. Central to our research is analyzing different activation functions for the gated linear unit (GLU), a crucial element of the GRN structure. We also employ Mutual Information Neural Estimation (MINE) to evaluate the GRN’s contribution. Additionally, the study examines the effects of integrating GRN within the Transformer’s attention mechanism versus using it as a separate intermediary layer. Our results highlight that GLU with sigmoid activation stands out, achieving 0.98 accuracy, 0.91 precision, 0.96 recall, and 0.94~F1 -score. The MINE analysis supports the hypothesis that GRN enhances the mutual information (MI) between the hidden representations and the output. Moreover, using GRN as an intermediate filter layer proves more beneficial than incorporating it within the Attention mechanism. This study clarifies how GRN boosters GRN–Transformer’s performance surpasses other techniques. These findings offer a promising avenue for adopting sophisticated models like Transformers in data-constrained environments, such as PPG artifact detection in PICU settings.

KW - Clinical PPG signals

KW - Gated Residual Networks (GRNs)

KW - imbalanced classes

KW - mutual information (MI)

KW - transformers

UR - https://www.scopus.com/pages/publications/105029692580

U2 - 10.1109/TNNLS.2026.3656756

DO - 10.1109/TNNLS.2026.3656756

M3 - Journal Article

AN - SCOPUS:105029692580

SN - 2162-237X

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

ER -