Keywords:
- accent embedding
- Accented speech
- Accentual mismatch
- acoustic generators
- Acoustic model adaptation
- acoustic modeling
- adaptation
- ADS-B data
- Aho-Corasick algorithm
- air surveillance data
- Air traffic control
- air traffic control communications
- air traffic controller
- air traffic controller’s workload
- air traffic management
- Air-Traffic Communication (ATC)
- Alzheimer's disease
- AM
- Anti-spoofing
- Arithmetic Coding
- Artificial intelligence
- Artificial Neural Networks
- ASR
- ASR robustness
- Assistant Based Speech Recognition
- association rules
- audio and voice analysis
- Audio Coding
- audiobook
- Automatic Depression Detection
- Automatic Speech Recognition
- automatic speech recognition (ASR)
- automatic speech recognition and understanding
- automatic speech understanding
- batch norm
- batch normalization
- bayesian fusion
- Benchmarking
- BERT
- bias
- bias aware
- BNF
- Building Blocks
- call sign detection
- Call-sign Detection
- Call-sign Recognition
- chunking
- claim verification
- Clinical Interviews
- Command Prediction Model
- command recognition rate
- Confidence Measure (CM)
- Contextual Adaptation
- contextual biasing
- Contextualisation and adaptation of ASR
- conversational AI
- conversational modeling
- Convolutional Neural Networks
- Criminal investigations
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- Customization of model
- data analysis
- Data Selection
- dataset
- deep learning
- Deep learning for speech
- deep MLPs
- Deep neural network
- deep neural networks
- Delays
- Depression Corpora
- depression detection
- dialogue
- dialogue simulation
- diarization
- direction of arrival
- direction-of-arrival estimation
- Discourse Annotation
- Discriminative features
- DISPLACE-2
- dnn
- DOA estimation
- domain adaptation
- Domain Classification
- dropout
- Dual mode encoder
- ECAPA-TDNN embedding
- electronic flight strips
- embedding
- Encoding
- end-to-end
- end-to-end ASR
- entity linking
- Entropy Coding
- Environmental mismatch
- Estimation
- explainability
- F1 score
- face verification
- fact checking
- factual reporting
- Feature extraction
- fine-tuning
- finite-state transducers
- FM
- fmllr
- Forensics
- Foundation Models
- Frequency Domain Linear Prediction (FDLP)
- fvae-lora
- gaming
- GDPR
- GMM
- GPU decoding
- Graph Convolutional Network (GCN)
- Graph Convolutional Networks
- Graph Neural Networks
- high-definition video-conferencing
- HTK
- Huffman Coding
- human factors
- Human-Computer Interaction
- human-robot interaction
- hybrid system
- i-vector
- i-vectors
- information verification
- Integration of prior knowledge
- Intent Classification
- inter-task fusion
- Interpretability
- Interpretable Models
- Iterative learning
- KeyWord Spotting (KWS)
- Keyword spotting detection
- KL-HMM
- knowledge distillation
- lan- guage identification
- language identification
- Language IDentification (LID)
- language modeling
- Language Models
- Language Production
- Language targets
- Large Language Models
- Large Vocabulary Continuous Speech Recognition (LVCSR)
- latent space factorization
- Lattice-Free MMI
- LEA
- legal framework
- LID
- likelihood-based encoding
- limited training data
- Linear prediction
- LLM
- LLM-based ASR
- local speaker segmentation
- logistic regression
- LoRA
- Low resource language
- low-rank adaptation
- low-resource
- LVCSR
- machine learning
- Machine Translation
- media bias
- Mental Lexicon
- MFCC
- microphone arrays
- Microphones
- model adaptation
- multi-face tracking
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multi-modal Approach
- multi-modal database
- multi-task
- multilingual acoustic modeling
- Multilingual automatic speech recognition
- Multimodal machine translation
- multimodal signal processing
- multiple remote tower
- multiple sound sources
- multiple speaker detection
- multitask acoustic modeling
- multitask learning
- multitask training
- named entity recognition
- Natural language processing
- network analysis
- network output
- neural nets
- neural network
- neural network-based sound source localization methods
- neural networks
- news media
- node weighted graphs
- non-native speech
- online speech recognition
- OOV-word recognition
- open-architecture distributed system
- OpenSky Network
- Operant Motive Test
- orchestration
- OSINT
- Out- Of-Language (OOL) detection
- out-of-domain
- Out-Of-Language (OOL) detection
- parametric speech synthesis
- parametric synthesis
- perceptual evaluation of audio quality (PEAQ)
- personal data processing
- personas
- PLDA
- Position measurement
- prompt projection
- Prompting
- pseudo-labelling
- Psycholinguistics
- rare word recognition
- Rare-word integration
- Raw Speech
- real-time ASR
- real-time audio processing
- real-time processing
- real-time speech recognition
- recurrent neural network
- reinforcement learning
- reliability estimation
- Representation and Processing
- reproducibility
- resources and evaluation
- Robots
- Robust Automatic Speech Recognition
- ROXANNE
- ROXSD
- saftety
- scenario management
- self-supervised learning
- self-supervised pre-training
- semi-supervised learning
- Semi-supervised training
- sensor fusion
- sentence embeddings
- Sentiment Analysis
- SGMM
- SGMM adaptation
- shallow fusion
- signal processing
- simultaneous detection
- single sound source
- situation awareness
- sound mixtures
- sound source localization
- spatial spectrum-based approaches
- speaker adaptation
- Speaker change detection
- speaker clustering
- Speaker Diarization
- Speaker identification
- speaker recognition
- speaker role classification
- speaker role detection
- speaker role identification
- speaker turn detection
- speaker verification
- Speech activity detection
- speech coding
- speech dataset
- speech decoding
- speech meta-data
- speech quality evaluations
- speech recognition
- speech synthesis
- speech understanding
- Speech-to-LLM alignment
- speech-to-text alignment
- spoken dialogue systems
- Spoken Language Understanding
- Spoken Term Detection (STD)
- spurious correlation robustness
- streaming ASR
- streaming transducer
- Subs-ace Gaussian Mixture Models
- subspace Gaussian mixture models
- supervised adaptation
- Supervised Autoencoders
- supervision
- synthetic dialogue
- System Combination
- Tandem
- task-oriented dialog
- Text classification
- text denoising
- Text fine-tuning
- Text Representation
- text to speech
- Text-based speaker diarization
- text-to-speech
- text-to-speech synthesis
- tower utterances
- TRACY · Law Enforcement Agencies · Suspect Detection· Non-Content Data· Social Influence Analysis· Link Prediction
- TRACY· Non-Content data· Law Enforcement Agencies · Suspect Detection· Mobile Signaling Data· ROXANNE
- training
- transfer learning
- transformer transducer
- transformers
- TTS
- Under-resourced data
- under-resourced languages
- under-resourced speech recognition
- unsupervised learning
- user identity linkage
- verification
- Very low bit rate speech coding
- voice-activity detection
- wav2vec 2.0
- wav2vec2
- weakly-supervised learning.
- Web data
- weighted finite state transducer
- WFST
- whisper
- Whisper models
- Word Consensus Networks
- Word-Confusion-Networks
- XLS-R
- XLSR
- XLSR-Transducer
- Zipformer
Publications of Petr Motlicek sorted by title
A
| Autocrime - open multimodal platform for combating organized crime, , , , , , , , , , , , , , , , , and , in: Forensic Science International: Digital Investigation, 54, 2025 |
[DOI] [URL] |
| Automated Interpretation of Air Traffic Control Communication: The Journey from Spoken Words to a Deeper Understanding of the Meaning, , , , , , , and , in: 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC), San Antonio, TX, USA, pages 1-9, IEEE, 2021 |
[DOI] |
| Automatic Call Sign Detection: Matching Air Surveillance Data with Air Traffic Spoken Communications, , , , , , , , , , , , , , , and , in: Proceedings of 8th OpenSky Symposium 2020, OpenSky Network, pages 1-10, MDPI, 2020 |
[DOI] [URL] |
| Automatic Out-of-Language Detection based on Confidence Measures derived from LVCSR Word and Phone Lattices, , Idiap-RR-06-2009 |
|
| Automatic Out-of-Language Detection Based on Confidence Measures Derived fromLVCSR Word and Phone Lattices, , in: 10thAnnual Conference of the International Speech Communication Association, ISCA, Brighton, England, 2009 |
|
| Automatic processing pipeline for collecting and annotating air-traffic voice communication data, , , , , , , , , and , in: Proceedings of 9th OpenSky Symposium 2020, OpenSky Network, Brussels, Belgium, pages 1-9, MDPI, 2021 |
|
| Automatic Speech Analysis Framework for ATC Communication in HAAWAII, , , , , and , in: 13th SESAR Innovation Days, 2023 |
|
| Automatic Speech Indexing System of Bilingual Video Parliament Interventions, , , , , and , Idiap-RR-25-2013 |
|
| Automatic Speech Recognition and Understanding for Radar Label Maintenance Support Increases Safety and Reduces Air Traffic Controllers’ Workload, , , , , , , , , , , and , in: Fifteenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2023), Eurocontrol (Europe), FAA (U.S.), Savannah, Georgia, USA, 2023 |
[URL] |
| Automatic Speech Recognition Benchmark for Air-Traffic Communications, , , , and , in: Proc. Interspeech 2020, pages 2297-2301, 2020 |
[DOI] |
| Automatic Summarization for Creative Writing: Denoising Auto-Encoder based Pipeline Method for Generating Summary of Movie Scripts, , , , and , in: Automatic Summarization for Creative Writing, International Conference on Computational Linguistics (COLING 2022), 2022 |
|
| Autoregressive Modelling of Hilbert Envelopes for Wide-band Audio Coding, , , and , in: AES 124th Convention, Audio Engineering Society, 2008 |
|
| Autoregressive Modelling of Hilbert Envelopes for Wide-band Audio Coding, , , and , Idiap-RR-40-2008 |
|
| Autoregressive Models of Amplitude Modulations in Audio Compression, , and , Idiap-RR-33-2009 |
|
| Autoregressive Models of Amplitude Modulations in Audio Compression, , and , in: IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2010 |
[URL] |
B
| BertAA: BERT fine-tuning for Authorship Attribution, , , and , in: Proceedings of the 17th International Conference on Natural Language Processing, 2020 |
|
| BertOdia: BERT pre-training for low resource Odia language, , , , , and , Idiap-RR-16-2021 |
|
| BERTraffic: A Robust BERT-Based Approach for Speaker Change Detection and Role Identification of Air-Traffic Communications, , , , , , and , Idiap-RR-15-2021 |
| BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications, , , , , , and , in: 2023 IEEE Spoken Language Technology Workshop (SLT), IEEE, 2023 |
[URL] |
| Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering, , , , , , , , , , , , and , in: Interspeech 2025, Rotterdam, The Netherlands, pages 3618--3622, 2025 |
[DOI] [URL] |
| Bi-Modal Authentication in Mobile Environments Using Session Variability Modelling, , , , and , Idiap-RR-18-2012 |
|
| Bi-Modal Authentication in Mobile Environments Using Session Variability Modelling, , , , and , in: Proceedings of the 21st International Conference on Pattern Recognition, 2012 |
|
| Bio-Medical Multi-label Scientific Literature Classification using LWAN and Dual-attention module, , , , and , in: PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION (PACLIC 2022), In Proceedings of ACL Anthology, 2022 |
|
| Boosting of contextual information in ASR for air-traffic call-sign recognition, , , , , , , and , in: Interspeech 2021, 2021 |
|
| Broadcast Media Content Categorization Using Low-Resolution Concepts, , , , and , Idiap-RR-06-2021 |
|
| Building Blocks of Assistant Based Speech Recognition for Air Traffic Management Applications, , , , , , , and , in: Conference: SESAR Innovation Days 2018, European Union, Eurocontrol, Salzburg, Austria, SESARJU, 2018 |
[URL] |
C
| CHALLENGES IN BROADCAST MEDIA CONTENT CATEGORIZATION, , and , Idiap-RR-02-2021 |
|
| Channel Selection in the Short-time Modulation Domain for Distant Speech Recognition, , , , and , in: Proceedings of Interspeech, pages 741-745, 2015 |
|
| Channel Selection in the Short-time Modulation Domain for Distant Speech Recognition; Comparison with the Envelope-Variance Measure, , , , and , Idiap-RR-30-2015 |
|
| Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction, , and , Idiap-Com-03-2022 |
[URL] |
| Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction, , and , in: Association for Computational Linguistics, Findings of the Association for Computational Linguistics: ACL 2023:10184–10205, 2023 |
[URL] |
| COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION, , and , Idiap-RR-17-2015 |
|
| COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION, , and , in: Proceedings of ICASSP 2015, pages 4834-4837, 2015 |
|
| Comparing different acoustic modeling techniques for multilingual boosting, , , , and , in: Proceedings of Interspeech, Portland, Oregon, 2012 |
|
| Comparing different acoustic modeling techniques for multilingual boosting, , , , and , Idiap-RR-01-2013 |
|
| Content Normalization for Text-dependent Speaker Verification, , , and , in: Proc. of Interspeech, 2017 |
|
| CONTENT NORMALIZATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION, , , and , Idiap-RR-31-2017 |
|
| CONTEXTUAL BIASING METHODS FOR IMPROVING RARE WORD DETECTION IN AUTOMATIC SPEECH RECOGNITION, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Korea, 2024 |
|
| Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems, , , , , , and , Idiap-RR-14-2021 |
[URL] |
| Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems, , , , , , and , in: Interspeech 2021, 2021 |
[URL] |
| Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition, and , Idiap-RR-21-2012 |
|
| Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition, and , in: Proceedings of Interspeech, Portland, Oregon, USA, pages to appear, 2012 |
|
| Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features, , , , and , Idiap-RR-05-2021 |
[URL] |
| Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features, , , , , and , in: Proceedings of APSIPA ASC 2019, 2019 |
| Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation, , and , Idiap-RR-39-2013 |
|