Skip to main navigation Skip to search Skip to main content

A Multimodal In-Ear Audio and Physiological Dataset for Swallowing and Non-Verbal Event Classification

  • École de technologie supérieure

Research output: Contribution to journalJournal Articlepeer-review

Abstract

Swallowing is a critical marker of neurological and emotional health. The ability to monitor it continuously and non-invasively, especially through smart ear-worn devices, holds significant promise for clinical applications. Despite this potential, no public audio datasets currently support reliable swallowing sound detection. Existing datasets focus primarily on speech and breathing, offering limited coverage and lacking detailed annotations for swallowing events. To address this gap, we introduce an in-ear audio dataset specifically designed to capture a wide range of verbal and non-verbal sounds. It includes comprehensive labeling focused on swallowing. The dataset was collected from 34 healthy adults (14 females and 20 males) between the ages of 20 and 29. Each participant performed a series of predefined tasks involving both non-verbal and verbal events. Non-verbal tasks included swallowing, clicking, forceful blinking, touching the scalp, and physical movements such as squatting or walking in place. Verbal tasks consisted of speaking (e.g., describing an image). Recordings were conducted in both quiet and noisy environments to better reflect real-world conditions. Data were captured using a combination of in-/outer-ear microphones, a chest belt to record electrocardiogram (ECG), respiration and acceleration signals, and an ultrasound probe to track tongue movement, which served as a reference for swallowing annotation. All signals were precisely synchronized. To ensure high data quality, the recordings were reviewed using both algorithmic analysis and manual inspection. Swallowing events were identified based on ultrasound signals and validated by an expert to guarantee accurate labeling. As a proof of concept that in-ear audio supports swallow classification, we fine-tune a fully connected neural network on YAMNet embeddings plus zero-crossing rate (ZCR) features. Across the completed folds, the model reaches an F1 score of 0.875 ± 0.013.

Original languageEnglish
Article number2019
JournalSensors
Volume26
Issue number7
DOIs
Publication statusPublished - Apr 2026

!!!Keywords

  • in-ear microphone
  • multimodal dataset
  • non-verbal events
  • swallowing classification

Fingerprint

Dive into the research topics of 'A Multimodal In-Ear Audio and Physiological Dataset for Swallowing and Non-Verbal Event Classification'. These topics are generated from the title and abstract of the publication. Together, they form a unique fingerprint.

Cite this