End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Juan_EMNLP_2023
Publication status:	Published
Booktitle:	The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Series:	1
Volume:	1
Number:	1
Year:	2023
Month:	December
Location:	Singapore
URL:	https://arxiv.org/abs/2311.006...
Abstract:	Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combines automatic speech recognition, speech translation and speaker turn detection using special tokens in a serialized labeling format. We run experiments on the Fisher-CALLHOME corpus, which we adapted by merging the two single-speaker channels into one multi-speaker channel, thus representing the more realistic and challenging scenario with multi-speaker turns and cross-talk. Experimental results across single- and multi-speaker conditions and against conventional ST systems, show that our model outperforms the reference systems on the multi-speaker condition, while attaining comparable performance on the single-speaker condition. We release scripts for data processing and model training.
Keywords:
Projects	Idiap
Authors	Zuluaga-Gomez, Juan Huang, Zhaocheng Niu, Xing Srinavasan, Sundararajan Mathur, Prashant Thompson, Brian Federico, Marcello
Added by:	[UNK]
Total mark:	0
Attachments
Juan_EMNLP_2023.pdf (Paper - Arxiv)
Notes

processing time: 0.0010 seconds.