<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
	<record>
		<datafield tag="980" ind1=" " ind2=" ">
			<subfield code="a">CONF</subfield>
		</datafield>
		<datafield tag="970" ind1=" " ind2=" ">
			<subfield code="a">Kocour_9THOPENSKYSYMPOSIUM2020_2021/IDIAP</subfield>
		</datafield>
		<datafield tag="245" ind1=" " ind2=" ">
			<subfield code="a">Automatic processing pipeline for collecting and annotating air-traffic voice communication data</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Kocour, Martin</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Vesely, Karel</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Szoke, Igor</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Kesiraju, Santosh</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Zuluaga-Gomez, Juan</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Alexander, Blatt</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Prasad, Amrutha</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Iuliia, Nigmatulina</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Motlicek, Petr</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">et al., </subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Air traffic control</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Automatic Speech Recognition</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Contextual Adaptation</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">language identification</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">named entity recognition</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">OpenSky Network</subfield>
		</datafield>
		<datafield tag="856" ind1="4" ind2="0">
			<subfield code="i">EXTERNAL</subfield>
			<subfield code="u">http://publications.idiap.ch/attachments/papers/2021/Kocour_9THOPENSKYSYMPOSIUM2020_2021.pdf</subfield>
			<subfield code="x">PUBLIC</subfield>
		</datafield>
		<datafield tag="711" ind1="2" ind2=" ">
			<subfield code="a">OpenSky Network - Proceedings of 9th OpenSky Symposium 2020</subfield>
			<subfield code="c">Brussels, Belgium</subfield>
		</datafield>
		<datafield tag="260" ind1=" " ind2=" ">
			<subfield code="c">2021</subfield>
			<subfield code="b">MDPI</subfield>
		</datafield>
		<datafield tag="773" ind1=" " ind2=" ">
			<subfield code="c">1-9</subfield>
		</datafield>
		<datafield tag="520" ind1=" " ind2=" ">
			<subfield code="a">This document describes our pipeline for automatic processing of the ATCO --  pilot audio communication we developed as part of the ATCO2 project. So far we collected two thousand hours of audio recordings that we either pre-process for the transcribers or use the data for semi-supervised training. Both ways of using the collected data can further improve our pipeline by retraining our models.
The proposed automatic processing pipeline is a cascade of many stand-alone components, namely: a) segmentation, b) volume control, c) signal-to-noise ratio filtering, d) diarization, e) 'speech-to-text' (ASR) module, f) English language detection, g) call-sign code recognition, h) ATCO -- pilot classification and i) highlighting the commands and values. %COMMAND_VALUE_REFERENCE
The key component of the pipeline is a speech-to-text transcription system that has to be trained with the real-world ATC data, otherwise, the performance is poor. To further improve the speech-to-text performance, we apply both the semi-supervised training with our recordings, and the contextual adaptation that uses a list of plausible call-signs from surveillance data as auxiliary information. The downstream NLP/NLU tasks are important from the application point of view. These application tasks need accurate models operating on top of the real speech-to-text output, so there is a need for more data too. And creating the ATC data is the main aspiration of the ATCO$^2$ project. At the end of the project, the data will be packaged and distributed by ELDA.</subfield>
		</datafield>
	</record>
</collection>