CONF Khosravani_ASRU_2021/IDIAP Learning to Translate Low-Resourced Swiss German Dialectal Speech into Standard German Text Khosravani, Abbas Garner, Philip N. Lazaridis, Alexandros EXTERNAL https://publications.idiap.ch/attachments/papers/2022/Khosravani_ASRU_2021.pdf PUBLIC IEEE Automatic Speech Recognition and Understanding Workshop Colombia, Cartagena 2021 IEEE For a low-resourced language like Swiss German with no standard orthography and a significant variation in its written form, spoken language resources are more likely to come with translations than transcriptions. Moreover, the desired output of an automatic transcription system for Swiss German multi-dialectal speech is Standard German. This, in turn, is due to many applications that include our TV Box voice assistant and broadcast media. It follows that a translation is usually required as Swiss German and Standard German have mismatches on all linguistic levels. Unfortunately, there are not enough parallel text corpora available for training a proper translation system, nor enough in-domain speech translation (ST) data for training an ST system. We aim at investigating an end-to-end approach for multi-dialect Swiss German ST using transfer learning. Our ST model is based on an encoder-decoder architecture where we initialize the encoder with a cross-lingual speech representation model which is adapted to in-domain Swiss German speech data. We demonstrate that training the decoder on an out-of-domain ST corpus by preserving the encoder unit and then fine-tuning on in-domain ST data can be more effective than a cascade or vanilla direct ST.