Keywords:
- accent embedding
- Accented speech
- Accentual mismatch
- acoustic generators
- Acoustic model adaptation
- acoustic modeling
- adaptation
- ADS-B data
- air surveillance data
- Air traffic control
- air traffic control communications
- air traffic controller
- air traffic controller’s workload
- air traffic management
- Alzheimer's disease
- AM
- Anti-spoofing
- Arithmetic Coding
- Artificial intelligence
- Artificial Neural Networks
- ASR
- Assistant Based Speech Recognition
- association rules
- audio and voice analysis
- Audio Coding
- audiobook
- Automatic Speech Recognition
- automatic speech recognition (ASR)
- automatic speech recognition and understanding
- automatic speech understanding
- batch norm
- batch normalization
- bayesian fusion
- BERT
- bias
- bias aware
- BNF
- Building Blocks
- call sign detection
- Call-sign Detection
- Call-sign Recognition
- chunking
- claim verification
- Command Prediction Model
- command recognition rate
- Confidence Measure (CM)
- Contextual Adaptation
- contextual biasing
- conversational modeling
- Convolutional Neural Networks
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- Customization of model
- data analysis
- Data Selection
- deep learning
- Deep learning for speech
- deep MLPs
- Deep neural network
- deep neural networks
- Delays
- depression detection
- dialogue
- diarization
- direction of arrival
- direction-of-arrival estimation
- Discourse Annotation
- Discriminative features
- dnn
- DOA estimation
- domain adaptation
- Domain Classification
- dropout
- electronic flight strips
- Encoding
- end-to-end
- end-to-end ASR
- entity linking
- Entropy Coding
- Environmental mismatch
- Estimation
- explainability
- F1 score
- face verification
- fact checking
- factual reporting
- Feature extraction
- fine-tuning
- finite-state transducers
- FM
- fmllr
- Forensics
- Frequency Domain Linear Prediction (FDLP)
- gaming
- GDPR
- GMM
- GPU decoding
- Graph Convolutional Networks
- Graph Neural Networks
- high-definition video-conferencing
- HTK
- Huffman Coding
- human factors
- Human-Computer Interaction
- human-robot interaction
- hybrid system
- i-vector
- i-vectors
- information verification
- Integration of prior knowledge
- Intent Classification
- inter-task fusion
- Interpretability
- Interpretable Models
- Iterative learning
- KeyWord Spotting (KWS)
- Keyword spotting detection
- KL-HMM
- knowledge distillation
- lan- guage identification
- language identification
- Language IDentification (LID)
- language modeling
- Language Models
- Language Production
- Language targets
- Large Language Models
- Large Vocabulary Continuous Speech Recognition (LVCSR)
- Lattice-Free MMI
- LEA
- legal framework
- LID
- likelihood-based encoding
- limited training data
- Linear prediction
- logistic regression
- Low resource language
- low-resource
- LVCSR
- machine learning
- Machine Translation
- media bias
- Mental Lexicon
- MFCC
- microphone arrays
- Microphones
- model adaptation
- multi-face tracking
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multi-modal Approach
- multi-modal database
- multi-task
- multilingual acoustic modeling
- Multilingual automatic speech recognition
- Multimodal machine translation
- multimodal signal processing
- multiple remote tower
- multiple sound sources
- multiple speaker detection
- multitask acoustic modeling
- multitask learning
- multitask training
- named entity recognition
- Natural language processing
- network output
- neural nets
- neural network
- neural network-based sound source localization methods
- neural networks
- news media
- node weighted graphs
- non-native speech
- online speech recognition
- OOV-word recognition
- open-architecture distributed system
- OpenSky Network
- Operant Motive Test
- OSINT
- Out- Of-Language (OOL) detection
- out-of-domain
- Out-Of-Language (OOL) detection
- parametric speech synthesis
- parametric synthesis
- perceptual evaluation of audio quality (PEAQ)
- personal data processing
- PLDA
- Position measurement
- pseudo-labelling
- Psycholinguistics
- rare word recognition
- Rare-word integration
- Raw Speech
- real-time audio processing
- real-time processing
- real-time speech recognition
- recurrent neural network
- reinforcement learning
- reliability estimation
- Representation and Processing
- resources and evaluation
- Robots
- Robust Automatic Speech Recognition
- ROXANNE
- ROXSD
- saftety
- self-supervised pre-training
- semi-supervised learning
- Semi-supervised training
- sensor fusion
- sentence embeddings
- Sentiment Analysis
- SGMM
- SGMM adaptation
- shallow fusion
- signal processing
- simultaneous detection
- single sound source
- situation awareness
- sound mixtures
- sound source localization
- spatial spectrum-based approaches
- speaker adaptation
- Speaker change detection
- speaker clustering
- Speaker identification
- speaker recognition
- speaker role classification
- speaker role detection
- speaker role identification
- speaker turn detection
- speaker verification
- Speech activity detection
- speech coding
- speech dataset
- speech decoding
- speech meta-data
- speech quality evaluations
- speech recognition
- speech synthesis
- speech understanding
- spoken dialogue systems
- Spoken Language Understanding
- Spoken Term Detection (STD)
- streaming transducer
- Subs-ace Gaussian Mixture Models
- subspace Gaussian mixture models
- supervised adaptation
- Supervised Autoencoders
- supervision
- System Combination
- Tandem
- task-oriented dialog
- Text classification
- Text Representation
- text to speech
- Text-based speaker diarization
- text-to-speech
- text-to-speech synthesis
- tower utterances
- TRACY · Law Enforcement Agencies · Suspect Detection· Non-Content Data· Social Influence Analysis· Link Prediction
- training
- transfer learning
- transformers
- TTS
- Under-resourced data
- under-resourced languages
- under-resourced speech recognition
- unsupervised learning
- user identity linkage
- verification
- Very low bit rate speech coding
- voice-activity detection
- wav2vec 2.0
- wav2vec2
- weakly-supervised learning.
- Web data
- weighted finite state transducer
- WFST
- whisper
- Word Consensus Networks
- Word-Confusion-Networks
- XLS-R
- XLSR-Transducer
- Zipformer
Publications of Petr Motlicek sorted by journal and type
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022)
Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings, , , , and , in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2022 |
[DOI] |
12th SESAR Innovation Days (2022)
Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition, , , , , , and , in: 12th SESAR Innovation Days, 2022 |
|
ACL (2022)
Hierarchical Multi-task learning framework for Isometric-Speech Language Translation, , , and , in: ACL, 2022 |
|
PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION (PACLIC 2022), In Proceedings of ACL Anthology (2022)
HMIST: Hierarchical Multilingual Isometric Speech Translation using Multi-Task Learning Framework for Automatic Dubbing, , , and , in: PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION (PACLIC 2022), In Proceedings of ACL Anthology, 2022 |
|
ACL (2022)
IDIAP Submission@LT-EDI-ACL2022 : Hope Speech Detection for Equality, Diversity and Inclusion, and , in: ACL, 2022 |
|
IDIAP Submission@LT-EDI-ACL2022: Detecting Signs of Depression from Social Media Text, and , in: ACL, 2022 |
|
ACL Proceedings (2022)
IDIAP Submission@LT-EDI-ACL2022: Homophobia/Transphobia Detection in social media comments, and , in: ACL Proceedings, 2022 |
|
The 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE @ EMNLP 2022) (2022)
IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach, , , , , , and , in: The 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE @ EMNLP 2022), 2022 |
[URL] |
IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model, , , , , , and , in: The 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE @ EMNLP 2022), 2022 |
[URL] |
ACL (2022)
IDIAP_TIET@LT-EDI-ACL2022 : Hope Speech Detection in Social Media using Contextualized BERT with Attention Mechanism, , and , in: ACL, 2022 |
|
International Conference on Computational Linguistics (COLING 2022) (2022)
Innovators@SMM4H'22: An Ensembles Approach for self-reporting of COVID-19 Vaccination Status Tweets, , , , and , in: International Conference on Computational Linguistics (COLING 2022), 2022 |
|
Innovators@SMM4H'22: An Ensembles Approach for Stance and Premise Classification of COVID-19 Health Mandates Tweets, , , , and , in: International Conference on Computational Linguistics (COLING 2022), 2022 |
12th SESAR Innovation Days (2022)
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator, , , , , and , in: 12th SESAR Innovation Days, 2022 |
|
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), NAACL 2022 (2022)
Team Innovators at SemEval-2022 for Task 8: Multi-Task Training with Hyperpartisan and Semantic Relation for Multi-Lingual News Article Similarity, , , , and , in: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), NAACL 2022, 2022 |
|
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (2021)
A COMPARISON OF METHODS FOR OOV-WORD RECOGNITION ON A NEW PUBLIC DATASET, , and , in: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE Signal Processing Society, Toronto, Ontario, Canada, 2021 |
|
2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC) (2021)
Automated Interpretation of Air Traffic Control Communication: The Journey from Spoken Words to a Deeper Understanding of the Meaning, , , , , , , and , in: 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC), San Antonio, TX, USA, pages 1-9, IEEE, 2021 |
[DOI] |
Proceedings of 9th OpenSky Symposium 2020 (2021)
Automatic processing pipeline for collecting and annotating air-traffic voice communication data, , , , , , , , , and , in: Proceedings of 9th OpenSky Symposium 2020, OpenSky Network, Brussels, Belgium, pages 1-9, MDPI, 2021 |
|
Interspeech 2021 (2021)
Boosting of contextual information in ASR for air-traffic call-sign recognition, , , , , , , and , in: Interspeech 2021, 2021 |
|
Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems, , , , , , and , in: Interspeech 2021, 2021 |
[URL] |
1st ISCA Symposium on Security and Privacy in Speech Communication (2021)
Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data, , , and , in: 1st ISCA Symposium on Security and Privacy in Speech Communication, pages 10--13, 2021 |
[DOI] |
Proceedings of Interspeech 2021 (2021)
Late Fusion of the Available Lexicon and Raw Waveform-based Acoustic Modeling for Depression and Dementia Recognition, , , , , and , in: Proceedings of Interspeech 2021, ISCA-International Speech Communication Association 2021, 2021 |
|
11th SESAR Innovation Days (2021)
Measuring Speech Recognition And Understanding Performance in Air Traffic Control Domain Beyond Word Error Rates, , , , , , , , and , in: 11th SESAR Innovation Days, 2021 |
|
Proceedings of Interspeech 2021 (2021)
Multi-task Neural Network for Robust Multiple Speaker Embedding Extraction, , and , in: Proceedings of Interspeech 2021, 2021 |
Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021) (2021)
Multimodal Neural Machine Translation System for English to Bengali, , , , , , and , in: Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021), Online (Virtual Mode), pages 31--39, INCOMA Ltd., 2021 |
[URL] |
Proceedings of Interspeech 2021 (2021)
Multitask adaptation with Lattice-Free MMI for multi-genre speech recognition of low resource languages, , and , in: Proceedings of Interspeech 2021, 2021 |
|
Proceedings of the 8th Workshop on Asian Translation (WAT2021) (2021)
NLPHut's Participation at WAT2021, , , , , , , and , in: Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 146--154, Association for Computational Linguistics, 2021 |
[URL] |
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas (2021)
Open Machine Translation for Low Resource South American Languages (AmericasNLP 2021 Shared Task Contribution), , , , , , , , and , in: Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 218–223, Association for Computational Linguistics, 2021 |
[DOI] [URL] |
1st ISCA Symposium on Security and Privacy in Speech Communication (2021)
Open-Set Speaker Identification pipeline in live criminal investigations, and , in: 1st ISCA Symposium on Security and Privacy in Speech Communication, 2021 |
|
Fourteenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2021) (2021)
Readback Error Detection by Automatic Speech Recognition to Increase ATM Safety, , , , , , , , , , , , , and , in: Fourteenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2021), The United States Federal Aviation Administration (FAA), EUROCONTROL, pages 10, 2021 |
[URL] |
Interspeech (2021)
Robust Command Recognition for Lithuanian Air Traffic Control Tower Utterances, , , , , , , and , in: Interspeech, 2021 |
|
Interspeech Show and Tell 2021 (2021)
ROXANNE Research Platform: Automate criminal investigations, , , , , and , in: Interspeech Show and Tell 2021, 2021 |
|
1st ISCA Symposium on Security and Privacy in Speech Communication (2021)
ROXSD: a Simulated Dataset of Communication in Organized Crime, , , , and , in: 1st ISCA Symposium on Security and Privacy in Speech Communication, 2021 |
|
Interspeech (2021)
Speech Activity Detection Based on Multilingual Speech Recognition System, , and , in: Interspeech, 2021 |
|
Proceedings of 8th OpenSky Symposium 2020 (2020)
Automatic Call Sign Detection: Matching Air Surveillance Data with Air Traffic Spoken Communications, , , , , , , , , , , , , , , and , in: Proceedings of 8th OpenSky Symposium 2020, OpenSky Network, pages 1-10, MDPI, 2020 |
[DOI] [URL] |
Proc. Interspeech 2020 (2020)
Automatic Speech Recognition Benchmark for Air-Traffic Communications, , , , and , in: Proc. Interspeech 2020, pages 2297-2301, 2020 |
[DOI] |
Proceedings of the 17th International Conference on Natural Language Processing (2020)
BertAA: BERT fine-tuning for Authorship Attribution, , , and , in: Proceedings of the 17th International Conference on Natural Language Processing, 2020 |
|
Detection of Similar Languages and Dialects Using Deep Supervised Autoencoders, , , , and , in: Proceedings of the 17th International Conference on Natural Language Processing, 2020 |
|
Proceedings of the GermEval 2020 Shared Task on the Classification and Regression of Cognitive and Motivational style from Text (2020)
Idiap & UAM participation at GermEval 2020: Classification and Regression of Cognitive and Motivational Style from Text, , , , and , in: Proceedings of the GermEval 2020 Shared Task on the Classification and Regression of Cognitive and Motivational style from Text, 2020 |
[URL] |
Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020) (2020)
Idiap and UAM Participation at MEX-A3T Evaluation Campaign, , , , and , in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020), pages 6, CEUR Workshop Proceedings, 2020 |
[URL] |
Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS) (2020)
Idiap Submission to Swiss-German Language Detection Shared Task, , , , and , in: Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS), CEUR Workshop Proceedings, 2020 |
[URL] |
Proceedings of ICASSP 2020 (2020)
INCREMENTAL SEMI-SUPERVISED LEARNING FOR MULTI-GENRE SPEECH RECOGNITION, , , , , and , in: Proceedings of ICASSP 2020, 2020 |
|
In Proceedings of Interspeech 2020 (2020)
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition System, , , , , and , in: In Proceedings of Interspeech 2020, pages 4746--4750, ISCA, 2020 |
|
Proceedings of the 7th Workshop on Asian Translation (2020)
ODIANLP's Participation in WAT2020, , , , , , , , and , in: Proceedings of the 7th Workshop on Asian Translation, ACL Anthology, 2020 |
|
Interspeech (2020)
Supervised domain adaptation for text-independent speaker verification using limited data, , , and , in: Interspeech, pages 3815-3819, 2020 |
[URL] |
Proceedings of the 29th IEEE International Conference on Robot & Human Interactive Communication (2020)
The MuMMER data set for Robot Perception in multi-party HRI Scenarios, , , and , in: Proceedings of the 29th IEEE International Conference on Robot & Human Interactive Communication, 2020 |
|
In Proceedings of ICASSP 2019 (2019)
A BAYESIAN APPROACH TO INTER-TASK FUSION FOR SPEAKER RECOGNITION, , and , in: In Proceedings of ICASSP 2019, Brighton, ENGLAND, pages 5786-5790, 2019 |
|
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2019) (2019)
Abstract Text Summarization: A Low Resource Challenge, and , in: In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2019), HongKong, China, pages 5, Association for Computational Linguistics (ACL), 2019 |
|
Proceedings of the 2nd International Conference on Intelligent Human Systems Integration (IHSI 2019): Integrating People and Intelligent Systems (2019)
Adaptation of Assistant Based Speech Recognition to New Domains and Its Acceptance by Air Traffic Controllers, , , , , , , , , and , in: Proceedings of the 2nd International Conference on Intelligent Human Systems Integration (IHSI 2019): Integrating People and Intelligent Systems, San Diego, California, USA, pages 820 - 826, 2019 |
[DOI] |
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2019)
Adaptation of Multiple Sound Source Localization Neural Networks with Weak Supervision and Domain-Adversarial Training, , and , in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, United Kingdom, pages 770-774, 2019 |
[DOI] |
11th International workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (2019)
AM-FM DECOMPOSITION OF SPEECH SIGNAL: APPLICATIONS FOR SPEECH PRIVACY AND DIAGNOSIS, , , , and , in: 11th International workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, Universita Degli Studi Firenze, Firenze, Italy, 2019 |
[URL] |