<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
	<record>
		<datafield tag="980" ind1=" " ind2=" ">
			<subfield code="a">CONF</subfield>
		</datafield>
		<datafield tag="970" ind1=" " ind2=" ">
			<subfield code="a">Motlicek_SIDS2025_2025/IDIAP</subfield>
		</datafield>
		<datafield tag="245" ind1=" " ind2=" ">
			<subfield code="a">Leveraging Untranscribed Data for End-to-End Speech and Callsign Recognition in Air-Traffic Communication</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Motlicek, Petr</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Kumar, Shashi</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Khalil, Driss</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Prasad, Amrutha</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Christof, Schüpbach</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Air traffic control</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Automatic Speech Recognition</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">semi-supervised learning</subfield>
		</datafield>
		<datafield tag="711" ind1="2" ind2=" ">
			<subfield code="a">Eurocontrol - SESAR Innovation Days 2025 (https://www.sesarju.eu/SIDS2025)</subfield>
			<subfield code="c">Bled, Slovenia</subfield>
		</datafield>
		<datafield tag="260" ind1=" " ind2=" ">
			<subfield code="c">2025</subfield>
		</datafield>
		<datafield tag="856" ind1="4" ind2=" ">
			<subfield code="u">https://www.sesarju.eu/sites/default/files/documents/sid/2025/papers/SIDs_2025_paper_103-final.pdf</subfield>
			<subfield code="z">URL</subfield>
		</datafield>
		<datafield tag="520" ind1=" " ind2=" ">
			<subfield code="a">Accurate Automatic Speech Recognition (ASR) and callsign recognition in Air Traffic Control (ATC) are vital for safety, yet conventional two-step systems rely on large amounts of manually transcribed data, which is both costly and limited. This paper introduces a practical alternative using TokenVerse, a unified end-to-end model trained under a dual-task framework and enhanced through semi-supervised learning. Our main contribution shows that the model can jointly learn callsign boundaries and speech recognition, improving performance on both tasks simultaneously. Additionally, by generating pseudo-labels for 500 hours of unlabeled audio, we substantially expand the effective training data.
Experiments across multiple in-domain and out-of-domain ATC datasets demonstrate that the TokenVerse framework achieves state-of-the-art performance in both ASR and callsign detection, surpassing cascaded pipelines built on modern architectures (including Kaldi, XLSR/wav2vec 2.0, Zipformer, and Whisper). This work provides a robust and scalable foundation for deploying and continuously refining high-accuracy ATC systems in real-world settings where labeled data is inherently scarce. The end-to-end architecture is also relatively compact (approximately 317M parameters), making it well suited for real-time, low-latency deployment.</subfield>
		</datafield>
	</record>
</collection>