Deep Audio Features and Self-Supervised Learning for Early Diagnosis of Neonatal Diseases: Sepsis and Respiratory Distress Syndrome Classification from Infant Cry Signals

Research output: Contribution to journalJournal Articlepeer-review

6 Citations (Scopus)

Abstract

Neonatal mortality remains a critical global challenge, particularly in resource-limited settings with restricted access to advanced diagnostic tools. Early detection of life-threatening conditions like Sepsis and Respiratory Distress Syndrome (RDS), which significantly contribute to neonatal deaths, is crucial for timely interventions and improved survival rates. This study investigates the use of newborn cry sounds, specifically the expiratory segments (the most informative parts of cry signals) as non-invasive biomarkers for early disease diagnosis. We utilized an expanded and balanced cry dataset, applying Self-Supervised Learning (SSL) models—wav2vec 2.0, WavLM, and HuBERT—to extract feature representations directly from raw cry audio signals. This eliminates the need for manual feature extraction while effectively capturing complex patterns associated with sepsis and RDS. A classifier consisting of a single fully connected layer was placed on top of the SSL models to classify newborns into Healthy, Sepsis, or RDS groups. We fine-tuned the SSL models and classifiers by optimizing hyperparameters using two learning rate strategies: linear and annealing. Results demonstrate that the annealing strategy consistently outperformed the linear strategy, with wav2vec 2.0 achieving the highest accuracy of approximately 90% (89.76%). These findings highlight the potential of integrating this method into Newborn Cry Diagnosis Systems (NCDSs). Such systems could assist medical staff in identifying critically ill newborns, prioritizing care, and improving neonatal outcomes through timely interventions.

Original languageEnglish
Article number248
JournalElectronics (Switzerland)
Volume14
Issue number2
DOIs
Publication statusPublished - Jan 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

!!!Keywords

  • HuBERT
  • RDS
  • WavLM
  • deep audio features
  • newborn cry diagnosis systems
  • self-supervised learning models
  • sepsis
  • wav2vec 2.0

Fingerprint

Dive into the research topics of 'Deep Audio Features and Self-Supervised Learning for Early Diagnosis of Neonatal Diseases: Sepsis and Respiratory Distress Syndrome Classification from Infant Cry Signals'. These topics are generated from the title and abstract of the publication. Together, they form a unique fingerprint.

Cite this