Résumé
The early detection and diagnosis of neonatal problems are critical to ensuring that an infant receives timely medical attention, which greatly enhances health outcomes. In this study, we propose a novel deep learning framework that listens to an infant’s cry to identify and diagnose six separate conditions: one being healthy and the other five comprising sepsis, respiratory distress syndrome, jaundice, hyperbilirubinemia, and vomiting. The study utilizes a rich dataset of infant cry recordings from which key acoustic features such as spectrograms, Mel-spectrograms, and Gammatone Frequency Cepstral Coefficients (GFCCs) are extracted. A sophisticated Vision Transformer (ViT) model was developed and meticulously fine-tuned to achieve an impressive 99% classification accuracy through cross-validation. To enhance the model’s interpretability, powerful explainable artificial intelligence (XAI) methods such as LRP, LIME, and attention imaging were implemented to clarify the reasoning behind the model’s outputs. Through cross-validation tests, the model’s trustworthiness and extensive generalizability were assessed. The findings underscore the promising capabilities of employing transformer-based deep learning frameworks along with multimodal acoustic features and explanatory methods to improve cry analysis in infants and their usable scopes in pediatric medicine.
| langue originale | Anglais |
|---|---|
| Pages (de - à) | 161103-161115 |
| Nombre de pages | 13 |
| journal | IEEE Access |
| Volume | 13 |
| Les DOIs | |
| état | Publié - 2025 |
| Modification externe | Oui |
Empreinte digitale
Voici les principaux termes ou expressions associés à « Unveiling Hidden Patterns in Infant Cry Audio: A Multi-Feature Vision Transformer Approach With Explainable AI ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.Contient cette citation
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver