Unveiling Hidden Patterns in Infant Cry Audio: A Multi-Feature Vision Transformer Approach With Explainable AI

Résultats de recherche: Contribution à un journalArticle publié dans une revue, révisé par les pairsRevue par des pairs

Résumé

The early detection and diagnosis of neonatal problems are critical to ensuring that an infant receives timely medical attention, which greatly enhances health outcomes. In this study, we propose a novel deep learning framework that listens to an infant’s cry to identify and diagnose six separate conditions: one being healthy and the other five comprising sepsis, respiratory distress syndrome, jaundice, hyperbilirubinemia, and vomiting. The study utilizes a rich dataset of infant cry recordings from which key acoustic features such as spectrograms, Mel-spectrograms, and Gammatone Frequency Cepstral Coefficients (GFCCs) are extracted. A sophisticated Vision Transformer (ViT) model was developed and meticulously fine-tuned to achieve an impressive 99% classification accuracy through cross-validation. To enhance the model’s interpretability, powerful explainable artificial intelligence (XAI) methods such as LRP, LIME, and attention imaging were implemented to clarify the reasoning behind the model’s outputs. Through cross-validation tests, the model’s trustworthiness and extensive generalizability were assessed. The findings underscore the promising capabilities of employing transformer-based deep learning frameworks along with multimodal acoustic features and explanatory methods to improve cry analysis in infants and their usable scopes in pediatric medicine.

langue originaleAnglais
Pages (de - à)161103-161115
Nombre de pages13
journalIEEE Access
Volume13
Les DOIs
étatPublié - 2025
Modification externeOui

Empreinte digitale

Voici les principaux termes ou expressions associés à « Unveiling Hidden Patterns in Infant Cry Audio: A Multi-Feature Vision Transformer Approach With Explainable AI ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation