PICU Face and Thoracoabdominal Detection Using Self-Supervised Divided Space–Time Mamba

Mohamed Khalil Ben Salah; Philippe Jouvet; Rita Noumeir

doi:10.3390/life15111706

PICU Face and Thoracoabdominal Detection Using Self-Supervised Divided Space–Time Mamba

Mohamed Khalil Ben Salah
, Philippe Jouvet
, Rita Noumeir

Résultats de recherche: Contribution à un journal › Article publié dans une revue, révisé par les pairs › Revue par des pairs

Résumé

Non-contact vital sign monitoring in Pediatric Intensive Care Units is challenged by frequent occlusions, data scarcity, and the need for temporally stable anatomical tracking to extract reliable physiological signals. Traditional detectors produce unstable tracking, while video transformers are too computationally intensive for deployment on resource-limited clinical hardware. We introduce Divided Space–Time Mamba, an architecture that decouples spatial and temporal feature learning using State Space Models to achieve linear-time complexity, over 92% lower than standard transformers. To handle data scarcity, we employ self-supervised pre-training with masked autoencoders on over 50 k domain-specific video clips and further enhance robustness with multimodal RGB-D input. Our model demonstrates superior performance, achieving 0.96 [email protected], 0.62 mAP50-95, and 0.95 rotated IoU. Operating at 23 FPS (43 ms latency), our method is approximately 1.9× faster than VideoMAE and 5.7× faster than frame-wise YOLOv8, demonstrating its suitability for real-time clinical monitoring.

langue originale	Anglais
Numéro d'article	1706
journal	Life
Volume	15
Numéro de publication	11
Les DOIs	https://doi.org/10.3390/life15111706
état	Publié - nov. 2025
Modification externe	Oui

Accès au document

10.3390/life15111706

Autres fichiers et liens

Lien vers la publication dans Scopus

Empreinte digitale

Voici les principaux termes ou expressions associés à « PICU Face and Thoracoabdominal Detection Using Self-Supervised Divided Space–Time Mamba ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation

@article{96152ebb74bf41a6aa6a948510e47ed2,

title = "PICU Face and Thoracoabdominal Detection Using Self-Supervised Divided Space–Time Mamba",

abstract = "Non-contact vital sign monitoring in Pediatric Intensive Care Units is challenged by frequent occlusions, data scarcity, and the need for temporally stable anatomical tracking to extract reliable physiological signals. Traditional detectors produce unstable tracking, while video transformers are too computationally intensive for deployment on resource-limited clinical hardware. We introduce Divided Space–Time Mamba, an architecture that decouples spatial and temporal feature learning using State Space Models to achieve linear-time complexity, over 92\% lower than standard transformers. To handle data scarcity, we employ self-supervised pre-training with masked autoencoders on over 50 k domain-specific video clips and further enhance robustness with multimodal RGB-D input. Our model demonstrates superior performance, achieving 0.96 [email protected], 0.62 mAP50-95, and 0.95 rotated IoU. Operating at 23 FPS (43 ms latency), our method is approximately 1.9× faster than VideoMAE and 5.7× faster than frame-wise YOLOv8, demonstrating its suitability for real-time clinical monitoring.",

keywords = "PICU, multimodal RGB-D, non-contact vital sign monitoring, self-supervised learning, state space models",

author = "\{Ben Salah\}, \{Mohamed Khalil\} and Philippe Jouvet and Rita Noumeir",

note = "Publisher Copyright: {\textcopyright} 2025 by the authors.",

year = "2025",

month = nov,

doi = "10.3390/life15111706",

language = "English",

volume = "15",

journal = "Life",

issn = "2075-1729",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "11",

}

TY - JOUR

T1 - PICU Face and Thoracoabdominal Detection Using Self-Supervised Divided Space–Time Mamba

AU - Ben Salah, Mohamed Khalil

AU - Jouvet, Philippe

AU - Noumeir, Rita

PY - 2025/11

Y1 - 2025/11

N2 - Non-contact vital sign monitoring in Pediatric Intensive Care Units is challenged by frequent occlusions, data scarcity, and the need for temporally stable anatomical tracking to extract reliable physiological signals. Traditional detectors produce unstable tracking, while video transformers are too computationally intensive for deployment on resource-limited clinical hardware. We introduce Divided Space–Time Mamba, an architecture that decouples spatial and temporal feature learning using State Space Models to achieve linear-time complexity, over 92% lower than standard transformers. To handle data scarcity, we employ self-supervised pre-training with masked autoencoders on over 50 k domain-specific video clips and further enhance robustness with multimodal RGB-D input. Our model demonstrates superior performance, achieving 0.96 [email protected], 0.62 mAP50-95, and 0.95 rotated IoU. Operating at 23 FPS (43 ms latency), our method is approximately 1.9× faster than VideoMAE and 5.7× faster than frame-wise YOLOv8, demonstrating its suitability for real-time clinical monitoring.

AB - Non-contact vital sign monitoring in Pediatric Intensive Care Units is challenged by frequent occlusions, data scarcity, and the need for temporally stable anatomical tracking to extract reliable physiological signals. Traditional detectors produce unstable tracking, while video transformers are too computationally intensive for deployment on resource-limited clinical hardware. We introduce Divided Space–Time Mamba, an architecture that decouples spatial and temporal feature learning using State Space Models to achieve linear-time complexity, over 92% lower than standard transformers. To handle data scarcity, we employ self-supervised pre-training with masked autoencoders on over 50 k domain-specific video clips and further enhance robustness with multimodal RGB-D input. Our model demonstrates superior performance, achieving 0.96 [email protected], 0.62 mAP50-95, and 0.95 rotated IoU. Operating at 23 FPS (43 ms latency), our method is approximately 1.9× faster than VideoMAE and 5.7× faster than frame-wise YOLOv8, demonstrating its suitability for real-time clinical monitoring.

KW - PICU

KW - multimodal RGB-D

KW - non-contact vital sign monitoring

KW - self-supervised learning

KW - state space models

UR - https://www.scopus.com/pages/publications/105023130750

U2 - 10.3390/life15111706

DO - 10.3390/life15111706

M3 - Journal Article

AN - SCOPUS:105023130750

SN - 2075-1729

VL - 15

JO - Life

JF - Life

IS - 11

M1 - 1706

ER -