Unveiling Hidden Patterns in Infant Cry Audio: A Multi-Feature Vision Transformer Approach With Explainable AI

Research output: Contribution to journalJournal Articlepeer-review

Abstract

The early detection and diagnosis of neonatal problems are critical to ensuring that an infant receives timely medical attention, which greatly enhances health outcomes. In this study, we propose a novel deep learning framework that listens to an infant’s cry to identify and diagnose six separate conditions: one being healthy and the other five comprising sepsis, respiratory distress syndrome, jaundice, hyperbilirubinemia, and vomiting. The study utilizes a rich dataset of infant cry recordings from which key acoustic features such as spectrograms, Mel-spectrograms, and Gammatone Frequency Cepstral Coefficients (GFCCs) are extracted. A sophisticated Vision Transformer (ViT) model was developed and meticulously fine-tuned to achieve an impressive 99% classification accuracy through cross-validation. To enhance the model’s interpretability, powerful explainable artificial intelligence (XAI) methods such as LRP, LIME, and attention imaging were implemented to clarify the reasoning behind the model’s outputs. Through cross-validation tests, the model’s trustworthiness and extensive generalizability were assessed. The findings underscore the promising capabilities of employing transformer-based deep learning frameworks along with multimodal acoustic features and explanatory methods to improve cry analysis in infants and their usable scopes in pediatric medicine.

Original languageEnglish
Pages (from-to)161103-161115
Number of pages13
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025
Externally publishedYes

!!!Keywords

  • Gammatone frequency cepstral coefficients (GFCC)
  • Infant cry classification
  • explainable AI (XAI)
  • feature extraction
  • layer-wise relevance propagation (LRP)
  • local interpretable model-agnostic explanations (LIME)
  • mel-Spectrogram
  • multi-feature audio representation
  • spectrogram
  • vision transformer (ViT)

Fingerprint

Dive into the research topics of 'Unveiling Hidden Patterns in Infant Cry Audio: A Multi-Feature Vision Transformer Approach With Explainable AI'. These topics are generated from the title and abstract of the publication. Together, they form a unique fingerprint.

Cite this