Keywords:
- accent embedding
- Accented speech
- Accentual mismatch
- acoustic generators
- Acoustic model adaptation
- acoustic modeling
- adaptation
- ADS-B data
- air surveillance data
- Air traffic control
- air traffic control communications
- air traffic controller
- air traffic controller’s workload
- air traffic management
- Alzheimer's disease
- AM
- Anti-spoofing
- Arithmetic Coding
- Artificial intelligence
- Artificial Neural Networks
- ASR
- Assistant Based Speech Recognition
- association rules
- audio and voice analysis
- Audio Coding
- audiobook
- Automatic Speech Recognition
- automatic speech recognition and understanding
- automatic speech understanding
- batch norm
- batch normalization
- bayesian fusion
- BERT
- bias aware
- BNF
- Building Blocks
- call sign detection
- Call-sign Detection
- Call-sign Recognition
- chunking
- claim verification
- Command Prediction Model
- command recognition rate
- Confidence Measure (CM)
- Contextual Adaptation
- contextual biasing
- conversational modeling
- Convolutional Neural Networks
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- Customization of model
- data analysis
- deep learning
- Deep learning for speech
- deep MLPs
- Deep neural network
- deep neural networks
- Delays
- depression detection
- dialogue
- diarization
- direction of arrival
- direction-of-arrival estimation
- Discourse Annotation
- Discriminative features
- dnn
- DOA estimation
- domain adaptation
- dropout
- electronic flight strips
- Encoding
- end-to-end
- end-to-end ASR
- entity linking
- Entropy Coding
- Environmental mismatch
- Estimation
- F1 score
- face verification
- fact checking
- Feature extraction
- fine-tuning
- finite-state transducers
- FM
- fmllr
- Forensics
- Frequency Domain Linear Prediction (FDLP)
- gaming
- GDPR
- GMM
- GPU decoding
- Graph Convolutional Networks
- Graph Neural Networks
- high-definition video-conferencing
- HTK
- Huffman Coding
- human factors
- Human-Computer Interaction
- human-robot interaction
- hybrid system
- i-vector
- i-vectors
- Integration of prior knowledge
- Intent Classification
- inter-task fusion
- Interpretability
- Interpretable Models
- Iterative learning
- KeyWord Spotting (KWS)
- Keyword spotting detection
- KL-HMM
- knowledge distillation
- lan- guage identification
- language identification
- Language IDentification (LID)
- language modeling
- Language Models
- Language Production
- Language targets
- Large Language Models
- Large Vocabulary Continuous Speech Recognition (LVCSR)
- Lattice-Free MMI
- LEA
- legal framework
- LID
- likelihood-based encoding
- limited training data
- Linear prediction
- logistic regression
- Low resource language
- low-resource
- LVCSR
- machine learning
- Machine Translation
- Mental Lexicon
- MFCC
- microphone arrays
- Microphones
- model adaptation
- multi-face tracking
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multi-modal Approach
- multi-modal database
- multi-task
- multilingual acoustic modeling
- Multilingual automatic speech recognition
- Multimodal machine translation
- multimodal signal processing
- multiple remote tower
- multiple sound sources
- multiple speaker detection
- multitask acoustic modeling
- multitask learning
- multitask training
- named entity recognition
- Natural language processing
- network output
- neural nets
- neural network
- neural network-based sound source localization methods
- neural networks
- node weighted graphs
- non-native speech
- online speech recognition
- OOV-word recognition
- open-architecture distributed system
- OpenSky Network
- Operant Motive Test
- OSINT
- Out- Of-Language (OOL) detection
- out-of-domain
- Out-Of-Language (OOL) detection
- parametric speech synthesis
- parametric synthesis
- perceptual evaluation of audio quality (PEAQ)
- personal data processing
- PLDA
- Position measurement
- pseudo-labelling
- Psycholinguistics
- rare word recognition
- Rare-word integration
- Raw Speech
- real-time audio processing
- real-time processing
- real-time speech recognition
- recurrent neural network
- reinforcement learning
- reliability estimation
- Representation and Processing
- resources and evaluation
- Robots
- Robust Automatic Speech Recognition
- ROXANNE
- ROXSD
- saftety
- self-supervised pre-training
- semi-supervised learning
- Semi-supervised training
- sensor fusion
- sentence embeddings
- Sentiment Analysis
- SGMM
- SGMM adaptation
- shallow fusion
- signal processing
- simultaneous detection
- single sound source
- situation awareness
- sound mixtures
- sound source localization
- spatial spectrum-based approaches
- speaker adaptation
- Speaker change detection
- speaker clustering
- Speaker identification
- speaker recognition
- speaker role classification
- speaker role detection
- speaker role identification
- speaker turn detection
- speaker verification
- Speech activity detection
- speech coding
- speech dataset
- speech decoding
- speech meta-data
- speech quality evaluations
- speech recognition
- speech synthesis
- speech understanding
- spoken dialogue systems
- Spoken Language Understanding
- Spoken Term Detection (STD)
- streaming transducer
- Subs-ace Gaussian Mixture Models
- subspace Gaussian mixture models
- supervised adaptation
- Supervised Autoencoders
- supervision
- System Combination
- Tandem
- task-oriented dialog
- Text classification
- Text Representation
- text to speech
- Text-based speaker diarization
- text-to-speech
- text-to-speech synthesis
- tower utterances
- TRACY · Law Enforcement Agencies · Suspect Detection· Non-Content Data· Social Influence Analysis· Link Prediction
- training
- transfer learning
- transformers
- TTS
- Under-resourced data
- under-resourced languages
- under-resourced speech recognition
- unsupervised learning
- user identity linkage
- verification
- Very low bit rate speech coding
- voice-activity detection
- wav2vec 2.0
- wav2vec2
- weakly-supervised learning.
- Web data
- weighted finite state transducer
- WFST
- Word Consensus Networks
- Word-Confusion-Networks
- XLS-R
- XLSR-Transducer
Publications of Petr Motlicek sorted by journal and type
EURASIP Journal on Audio, Speech, and Music Processing
Exploiting foreign resources for DNN-based ASR, , , , and , in: EURASIP Journal on Audio, Speech, and Music Processing(2015:17), 2015 |
[DOI] |
IEEE Signal Processing Letters
A Large-Scale Open-Source Acoustic Simulator for Speaker Recognition, , , , and , in: IEEE Signal Processing Letters, 23(4):527 - 531, 2016 |
|
A Simple Continuous Pitch Estimation Algorithm, , and , in: IEEE Signal Processing Letters, 20(1):102--105, 2013 |
[URL] |
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
Autoregressive Models of Amplitude Modulations in Audio Compression, , and , in: IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2010 |
[URL] |
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation, , and , in: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:1303-1317, 2021 |
[DOI] [URL] |
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
Incremental Syllable-Context Phonetic Vocoding, , , , and , in: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 23(6), 2015 |
[URL] |
International Journal of Computer and Electrical Engineering
The TA2 Database – A Multi-Modal Database From Home Entertainment, , and , in: International Journal of Computer and Electrical Engineering, 4(5):670-673, 2012 |
[URL] |
Sadhana
Current trends in multilingual speech processing, , , , , , , , and , in: Sadhana, 36(5):885–915, 2011 |
[DOI] [URL] |
Speech Communication
Template-matching for Text-dependent Speaker Verification, , , and , in: Speech Communication, 2017 |
|
Feature mapping using far-field microphones for distant speech recognition, , , and , in: Speech Communication, 83:1-9, 2016 |
[DOI] [URL] |
Using out-of-language data to improve an under-resourced speech recognizer, , , and , in: Speech Communication, 2013 |
[DOI] [URL] |
The Prague Bulletin of Mathematical Linguistics
Inferring Highly-dense Representations for Clustering Broadcast Media Content, , , and , in: The Prague Bulletin of Mathematical Linguistics, 2020 |
[URL] |
Publications of type Book
2020
OdiEnCorp 2.0: Odia-English Parallel Corpus for Machine Translation, , , , , and , European Language Resources Association (ELRA), 2020 |
[URL] |
2012
Together Anywhere, Together Anytime, Technologies for Intimate Interactions, , , and , Centrum Wiskunde & Informatica, 2012 |
Handbook of Biometric Anti-Spoofing (2019)
Voice Presentation Attack Detection Using Convolutional Neural Networks, , , , , and , in: Handbook of Biometric Anti-Spoofing, pages 391--415, Springer, 2019 |
[URL] |
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 (2024)
CONTEXTUAL BIASING METHODS FOR IMPROVING RARE WORD DETECTION IN AUTOMATIC SPEECH RECOGNITION, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Korea, 2024 |
|
Proceedings of the 6th Clinical Natural Language Processing Workshop (2024)
DAIC-WOZ: On the Validity of Using the Therapist's prompts in Automatic Depression Detection from Clinical Interviews, , , , , and , in: Proceedings of the 6th Clinical Natural Language Processing Workshop, Association for Computational Linguistics, 2024 |
|
15th EAI International Conference on Digital Forensics & Cyber Crime (2024)
Detecting Criminal Networks via Non-Content Communication Data Analysis Techniques from the TRACY Project, , , , , , , , , and , in: 15th EAI International Conference on Digital Forensics & Cyber Crime, 2024 |
|
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024)
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction, , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, ACL, 2024 |
|
ECAI 2024 - 27th European Conference on Artificial Intelligence, October 19-24, 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings (2024)
Entity Matching Across Small Networks Using Node Attributes, , , , , , , , , and , in: ECAI 2024 - 27th European Conference on Artificial Intelligence, October 19-24, 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings, 2024 |
[DOI] |
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024)
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, ACL, 2024 |
|
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) (2024)
Fine-tuning Self-Supervised Models For Language Identification Using Orthonormal Constraint, , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2024 |
|
Experimental IR Meets Multilinguality, Multimodality, and Interaction - 14th International Conference of the CLEF Association, CLEF, 2024, Grenoble, France, September 9-12, 2024, Proceedings (2024)
Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions, , , and , in: Experimental IR Meets Multilinguality, Multimodality, and Interaction - 14th International Conference of the CLEF Association, CLEF, 2024, Grenoble, France, September 9-12, 2024, Proceedings, 2024 |
|
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 (2024)
Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, 2024 |
Odyssey 2024: The Speaker and Language Recognition Workshop (2024)
Normalizing Flows for Speaker and Language Recognition Backend, , , , and , in: Odyssey 2024: The Speaker and Language Recognition Workshop, 2024 |
|
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 (2024)
Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, 2024 |
|
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (2024)
Reliability Estimation of News Media Sources: Birds of a Feather Flock Together, , , and , in: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, 2024 |
|
Odyssey 2024: The Speaker and Language Recognition Workshop (2024)
ROXSD: The ROXANNE Multimodal and Simulated Dataset for Advancing Criminal Investigations, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and , in: Odyssey 2024: The Speaker and Language Recognition Workshop, pages 17-24, 2024 |
[DOI] [URL] |
Interspeech 2024 (2024)
Speech and Language Recognition with Low-rank Adaptation of Pretrained Models, , , , and , in: Interspeech 2024, 2024 |
|
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024)
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, ACL, 2024 |
[URL] |
13th SESAR Innovation Days (2023)
Automatic Speech Analysis Framework for ATC Communication in HAAWAII, , , , , and , in: 13th SESAR Innovation Days, 2023 |
|
Fifteenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2023) (2023)
Automatic Speech Recognition and Understanding for Radar Label Maintenance Support Increases Safety and Reduces Air Traffic Controllers’ Workload, , , , , , , , , , , and , in: Fifteenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2023), Eurocontrol (Europe), FAA (U.S.), Savannah, Georgia, USA, 2023 |
[URL] |
2023 IEEE Spoken Language Technology Workshop (SLT) (2023)
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications, , , , , , and , in: 2023 IEEE Spoken Language Technology Workshop (SLT), IEEE, 2023 |
[URL] |
Proc. 13th SESAR Innovation Days (2023)
Customization of Automatic Speech Recognition Engines for Rare Word Detection Without Costly Model Re-Training, , , , , , and , in: Proc. 13th SESAR Innovation Days, Seville, Spain, 2023 |
[DOI] [URL] |
Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (2023)
Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks, , , , , , , , and , in: Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 |
|
2023 IEEE Spoken Language Technology Workshop (SLT) (2023)
How Does Pre-trained Wav2Vec 2.0 Perform on Domain-Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications, , , , , , , , and , in: 2023 IEEE Spoken Language Technology Workshop (SLT), IEEE, 2023 |
[URL] |
Proc. Interspeech 2023 (2023)
HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition, , , and , in: Proc. Interspeech 2023, Ireland, 2023 |
|
Implementing contextual biasing in GPU decoder for online ASR, , , , , , and , in: Proc. Interspeech 2023, 2023 |
|
Proceedings of Interspeech (2023)
Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews, , , and , in: Proceedings of Interspeech, 2023 |
|
Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU'23 (2023)
Parameter-Efficient Tuning With Adaptive Bottlenecks For Automatic Speech Recognition, , , , , and , in: Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU'23, 2023 |
|
Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2022)
A two-step approach to leverage contextual data: speech recognition in air-traffic communications, , , , and , in: Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 |
|
Pacific Asia Conference on Language, Information and Computation (PACLIC 36), In ACL Anthology Proceedings (2022)
An Empirical Comparison of Semantic Similarity Methods for Analyzing down-streaming Automatic Minuting task, , , , and , in: Pacific Asia Conference on Language, Information and Computation (PACLIC 36), In ACL Anthology Proceedings, 2022 |
|
Pacific Asia Conference on Language, Information and Computation (PACLIC 36) , In proceedings of ACL Anthology (2022)
An End-to-End Multilingual System for Automatic Minuting of Multi-Party Dialogues, , , and , in: Pacific Asia Conference on Language, Information and Computation (PACLIC 36) , In proceedings of ACL Anthology, 2022 |
|
Automatic Summarization for Creative Writing, International Conference on Computational Linguistics (COLING 2022) (2022)
Automatic Summarization for Creative Writing: Denoising Auto-Encoder based Pipeline Method for Generating Summary of Movie Scripts, , , , and , in: Automatic Summarization for Creative Writing, International Conference on Computational Linguistics (COLING 2022), 2022 |
|
PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION (PACLIC 2022), In Proceedings of ACL Anthology (2022)
Bio-Medical Multi-label Scientific Literature Classification using LWAN and Dual-attention module, , , , and , in: PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION (PACLIC 2022), In Proceedings of ACL Anthology, 2022 |
|
Special Interest Group on Discourse and Dialogue (SIGDIAL 2022) (2022)
DeepCon: An End-to-End Multilingual Toolkit for Automatic Minuting of Multi-Party Dialogues, , , and , in: Special Interest Group on Discourse and Dialogue (SIGDIAL 2022), 2022 |
|
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022)
Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings, , , , and , in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2022 |
[DOI] |
12th SESAR Innovation Days (2022)
Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition, , , , , , and , in: 12th SESAR Innovation Days, 2022 |
|
ACL (2022)
Hierarchical Multi-task learning framework for Isometric-Speech Language Translation, , , and , in: ACL, 2022 |
|
PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION (PACLIC 2022), In Proceedings of ACL Anthology (2022)
HMIST: Hierarchical Multilingual Isometric Speech Translation using Multi-Task Learning Framework for Automatic Dubbing, , , and , in: PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION (PACLIC 2022), In Proceedings of ACL Anthology, 2022 |
|