Recursive Joint Attention for Audio-Visual Fusion in Regression Based Emotion Recognition

Résultats de recherche: Chapitre dans un livre, rapport, actes de conférenceParticipation à un ouvrage collectif lié à un colloque ou une conférenceRevue par des pairs

18 Citations (Scopus)

Résumé

In video-based emotion recognition (ER), it is important to effectively leverage the complementary relationship among audio (A) and visual (V) modalities, while retaining the intramodal characteristics of individual modalities. In this paper, a recursive joint attention model is proposed along with long short-term memory (LSTM) modules for the fusion of vocal and facial expressions in regression-based ER. Specifically, we investigated the possibility of exploiting the complementary nature of A and V modalities using a joint cross-attention model in a recursive fashion with LSTMs to capture the intramodal temporal dependencies within the same modalities as well as among the A-V feature representations. By integrating LSTMs with recursive joint cross-attention, our model can efficiently leverage both intra- and inter-modal relationships for the fusion of A and V modalities. The results of extensive experiments1 performed on the challenging Affwild2 and Fatigue (private) datasets indicate that the proposed A-V fusion model can significantly outperform state-of-art-methods.

langue originaleAnglais
titreICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
EditeurInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronique)9781728163277
Les DOIs
étatPublié - 2023
Evénement48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Grèce
Durée: 4 juin 202310 juin 2023

Série de publications

NomICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (imprimé)1520-6149

Conférence

Conférence48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Pays/TerritoireGrèce
La villeRhodes Island
période4/06/2310/06/23

Empreinte digitale

Voici les principaux termes ou expressions associés à « Recursive Joint Attention for Audio-Visual Fusion in Regression Based Emotion Recognition ». Ces libellés thématiques sont générés à partir du titre et du résumé de la publication. Ensemble, ils forment une empreinte digitale unique.

Contient cette citation