TY - GEN
T1 - Integration framework for speech processing with live visualization interfaces
AU - Brodeur, David
AU - Grondin, Francois
AU - Attabi, Yazid
AU - Dumouchel, Pierre
AU - Michaud, Francois
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/11/15
Y1 - 2016/11/15
N2 - Audition is a rich source of spatial, identity, linguistic and paralinguistic information. Processing all this information requires acquisition, processing and interpretation of sound sources, which are instantaneous, invisible and noisy signals. This can lead to different responses by the system in relation to the information perceived. This paper presents our first implementation of an integration framework for speech processing. Acquisition includes sound capture, sound source localization, tracking, separation and enhancement, and voice activity detection. Processing involves speech and emotion recognition. Interpretation consists of translating speech utterances into commands that can influence interaction through dialogue management and speech synthesis. The paper also describes two visualization interfaces, inspired by comic strips, to represent live vocal interactions in real life environments. These interfaces are used to demonstrate how the framework performs in live interactions and its use in a usability study.
AB - Audition is a rich source of spatial, identity, linguistic and paralinguistic information. Processing all this information requires acquisition, processing and interpretation of sound sources, which are instantaneous, invisible and noisy signals. This can lead to different responses by the system in relation to the information perceived. This paper presents our first implementation of an integration framework for speech processing. Acquisition includes sound capture, sound source localization, tracking, separation and enhancement, and voice activity detection. Processing involves speech and emotion recognition. Interpretation consists of translating speech utterances into commands that can influence interaction through dialogue management and speech synthesis. The paper also describes two visualization interfaces, inspired by comic strips, to represent live vocal interactions in real life environments. These interfaces are used to demonstrate how the framework performs in live interactions and its use in a usability study.
UR - https://www.scopus.com/pages/publications/85002986131
U2 - 10.1109/ROMAN.2016.7745103
DO - 10.1109/ROMAN.2016.7745103
M3 - Contribution to conference proceedings
AN - SCOPUS:85002986131
T3 - 25th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN 2016
SP - 144
EP - 150
BT - 25th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Symposium on Robot and Human Interactive Communication, RO-MAN 2016
Y2 - 26 August 2016 through 31 August 2016
ER -