Keywords:
- accent embedding
- Accented speech
- Accentual mismatch
- acoustic generators
- Acoustic model adaptation
- acoustic modeling
- adaptation
- ADS-B data
- air surveillance data
- Air traffic control
- air traffic control communications
- air traffic controller
- air traffic controller’s workload
- air traffic management
- Alzheimer's disease
- AM
- Anti-spoofing
- Arithmetic Coding
- Artificial intelligence
- Artificial Neural Networks
- ASR
- Assistant Based Speech Recognition
- association rules
- audio and voice analysis
- Audio Coding
- audiobook
- Automatic Speech Recognition
- automatic speech recognition and understanding
- automatic speech understanding
- batch norm
- batch normalization
- bayesian fusion
- BERT
- bias aware
- BNF
- Building Blocks
- call sign detection
- Call-sign Detection
- Call-sign Recognition
- chunking
- claim verification
- Command Prediction Model
- command recognition rate
- Confidence Measure (CM)
- Contextual Adaptation
- contextual biasing
- conversational modeling
- Convolutional Neural Networks
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- Customization of model
- data analysis
- deep learning
- Deep learning for speech
- deep MLPs
- Deep neural network
- deep neural networks
- Delays
- depression detection
- dialogue
- diarization
- direction of arrival
- direction-of-arrival estimation
- Discourse Annotation
- Discriminative features
- dnn
- DOA estimation
- domain adaptation
- dropout
- electronic flight strips
- Encoding
- end-to-end
- end-to-end ASR
- entity linking
- Entropy Coding
- Environmental mismatch
- Estimation
- F1 score
- face verification
- fact checking
- Feature extraction
- fine-tuning
- finite-state transducers
- FM
- fmllr
- Forensics
- Frequency Domain Linear Prediction (FDLP)
- gaming
- GDPR
- GMM
- GPU decoding
- Graph Convolutional Networks
- Graph Neural Networks
- high-definition video-conferencing
- HTK
- Huffman Coding
- human factors
- Human-Computer Interaction
- human-robot interaction
- hybrid system
- i-vector
- i-vectors
- Integration of prior knowledge
- Intent Classification
- inter-task fusion
- Interpretability
- Interpretable Models
- Iterative learning
- KeyWord Spotting (KWS)
- Keyword spotting detection
- KL-HMM
- knowledge distillation
- lan- guage identification
- language identification
- Language IDentification (LID)
- language modeling
- Language Models
- Language Production
- Language targets
- Large Language Models
- Large Vocabulary Continuous Speech Recognition (LVCSR)
- Lattice-Free MMI
- LEA
- legal framework
- LID
- likelihood-based encoding
- limited training data
- Linear prediction
- logistic regression
- Low resource language
- low-resource
- LVCSR
- machine learning
- Machine Translation
- Mental Lexicon
- MFCC
- microphone arrays
- Microphones
- model adaptation
- multi-face tracking
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multi-modal Approach
- multi-modal database
- multi-task
- multilingual acoustic modeling
- Multilingual automatic speech recognition
- Multimodal machine translation
- multimodal signal processing
- multiple remote tower
- multiple sound sources
- multiple speaker detection
- multitask acoustic modeling
- multitask learning
- multitask training
- named entity recognition
- Natural language processing
- network output
- neural nets
- neural network
- neural network-based sound source localization methods
- neural networks
- node weighted graphs
- non-native speech
- online speech recognition
- OOV-word recognition
- open-architecture distributed system
- OpenSky Network
- Operant Motive Test
- OSINT
- Out- Of-Language (OOL) detection
- out-of-domain
- Out-Of-Language (OOL) detection
- parametric speech synthesis
- parametric synthesis
- perceptual evaluation of audio quality (PEAQ)
- personal data processing
- PLDA
- Position measurement
- pseudo-labelling
- Psycholinguistics
- rare word recognition
- Rare-word integration
- Raw Speech
- real-time audio processing
- real-time processing
- real-time speech recognition
- recurrent neural network
- reinforcement learning
- reliability estimation
- Representation and Processing
- resources and evaluation
- Robots
- Robust Automatic Speech Recognition
- ROXANNE
- ROXSD
- saftety
- self-supervised pre-training
- semi-supervised learning
- Semi-supervised training
- sensor fusion
- sentence embeddings
- Sentiment Analysis
- SGMM
- SGMM adaptation
- shallow fusion
- signal processing
- simultaneous detection
- single sound source
- situation awareness
- sound mixtures
- sound source localization
- spatial spectrum-based approaches
- speaker adaptation
- Speaker change detection
- speaker clustering
- Speaker identification
- speaker recognition
- speaker role classification
- speaker role detection
- speaker role identification
- speaker turn detection
- speaker verification
- Speech activity detection
- speech coding
- speech dataset
- speech decoding
- speech meta-data
- speech quality evaluations
- speech recognition
- speech synthesis
- speech understanding
- spoken dialogue systems
- Spoken Language Understanding
- Spoken Term Detection (STD)
- streaming transducer
- Subs-ace Gaussian Mixture Models
- subspace Gaussian mixture models
- supervised adaptation
- Supervised Autoencoders
- supervision
- System Combination
- Tandem
- task-oriented dialog
- Text classification
- Text Representation
- text to speech
- Text-based speaker diarization
- text-to-speech
- text-to-speech synthesis
- tower utterances
- TRACY · Law Enforcement Agencies · Suspect Detection· Non-Content Data· Social Influence Analysis· Link Prediction
- training
- transfer learning
- transformers
- TTS
- Under-resourced data
- under-resourced languages
- under-resourced speech recognition
- unsupervised learning
- user identity linkage
- verification
- Very low bit rate speech coding
- voice-activity detection
- wav2vec 2.0
- wav2vec2
- weakly-supervised learning.
- Web data
- weighted finite state transducer
- WFST
- Word Consensus Networks
- Word-Confusion-Networks
- XLS-R
- XLSR-Transducer
Publications of Petr Motlicek
2021
Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems, , , , , , and , Idiap-RR-14-2021 |
[URL] |
Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems, , , , , , and , in: Interspeech 2021, 2021 |
[URL] |
Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features, , , , and , Idiap-RR-05-2021 |
[URL] |
Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR, , , , , , and , Idiap-RR-22-2021 |
|
Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data, , , and , in: 1st ISCA Symposium on Security and Privacy in Speech Communication, pages 10--13, 2021 |
[DOI] |
Improving callsign recognition with air-surveillance data in air-traffic communication, , , and , Idiap-RR-20-2021 |
[URL] |
Late Fusion of the Available Lexicon and Raw Waveform-based Acoustic Modeling for Depression and Dementia Recognition, , , , , and , Idiap-RR-09-2021 |
|
Late Fusion of the Available Lexicon and Raw Waveform-based Acoustic Modeling for Depression and Dementia Recognition, , , , , and , in: Proceedings of Interspeech 2021, ISCA-International Speech Communication Association 2021, 2021 |
|
Measuring Speech Recognition And Understanding Performance in Air Traffic Control Domain Beyond Word Error Rates, , , , , , , , and , in: 11th SESAR Innovation Days, 2021 |
|
Multi-task Neural Network for Robust Multiple Speaker Embedding Extraction, , and , in: Proceedings of Interspeech 2021, 2021 |
Multimodal Neural Machine Translation System for English to Bengali, , , , , , and , Idiap-RR-13-2021 |
|
Multimodal Neural Machine Translation System for English to Bengali, , , , , , and , in: Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021), Online (Virtual Mode), pages 31--39, INCOMA Ltd., 2021 |
[URL] |
Multitask adaptation with Lattice-Free MMI for multi-genre speech recognition of low resource languages, , and , in: Proceedings of Interspeech 2021, 2021 |
|
Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation, , and , in: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:1303-1317, 2021 |
[DOI] [URL] |
NLPHut's Participation at WAT2021, , , , , , , and , in: Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 146--154, Association for Computational Linguistics, 2021 |
[URL] |
NLPHut’s Participation at WAT2021, , , , , , , and , Idiap-RR-10-2021 |
|
Open Machine Translation for Low Resource South American Languages (AmericasNLP 2021 Shared Task Contribution), , , , , , , , and , Idiap-RR-07-2021 |
|
Open Machine Translation for Low Resource South American Languages (AmericasNLP 2021 Shared Task Contribution), , , , , , , , and , in: Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 218–223, Association for Computational Linguistics, 2021 |
[DOI] [URL] |
Open-Set Speaker Identification pipeline in live criminal investigations, and , in: 1st ISCA Symposium on Security and Privacy in Speech Communication, 2021 |
|
Readback Error Detection by Automatic Speech Recognition to Increase ATM Safety, , , , , , , , , , , , , and , in: Fourteenth USA/Europe Air Traffic Management Research and Development Seminar (ATM2021), The United States Federal Aviation Administration (FAA), EUROCONTROL, pages 10, 2021 |
[URL] |
Robust Command Recognition for Lithuanian Air Traffic Control Tower Utterances, , , , , , , and , in: Interspeech, 2021 |
|
ROXANNE Research Platform: Automate criminal investigations, , , , , and , in: Interspeech Show and Tell 2021, 2021 |
|
ROXSD: a Simulated Dataset of Communication in Organized Crime, , , , and , in: 1st ISCA Symposium on Security and Privacy in Speech Communication, 2021 |
|
Speech Activity Detection Based on Multilingual Speech Recognition System, , and , in: Interspeech, 2021 |
|
2020
A BAYESIAN APPROACH TO INTER-TASK FUSION FOR SPEAKER RECOGNITION, , and , Idiap-RR-07-2020 |
|
AM-FM DECOMPOSITION OF SPEECH SIGNAL: APPLICATIONS FOR SPEECH PRIVACY AND DIAGNOSIS, , , , and , Idiap-RR-01-2020 |
|
Automatic Call Sign Detection: Matching Air Surveillance Data with Air Traffic Spoken Communications, , , , , , , , , , , , , , , and , in: Proceedings of 8th OpenSky Symposium 2020, OpenSky Network, pages 1-10, MDPI, 2020 |
[DOI] [URL] |
Automatic Speech Recognition Benchmark for Air-Traffic Communications, , , , and , in: Proc. Interspeech 2020, pages 2297-2301, 2020 |
[DOI] |
BertAA: BERT fine-tuning for Authorship Attribution, , , and , in: Proceedings of the 17th International Conference on Natural Language Processing, 2020 |
|
Detection of Similar Languages and Dialects Using Deep Supervised Autoencoders, , , , and , in: Proceedings of the 17th International Conference on Natural Language Processing, 2020 |
|
German News Article Classification : A Multichannel CNN Approach, , and , Idiap-RR-09-2020 |
|
Idiap & UAM participation at GermEval 2020: Classification and Regression of Cognitive and Motivational Style from Text, , , , and , in: Proceedings of the GermEval 2020 Shared Task on the Classification and Regression of Cognitive and Motivational style from Text, 2020 |
[URL] |
Idiap Abstract Text Summarization System for German Text Summarization Task, and , Idiap-RR-03-2020 |
|
Idiap and UAM Participation at MEX-A3T Evaluation Campaign, , , , and , in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020), pages 6, CEUR Workshop Proceedings, 2020 |
[URL] |
Idiap NMT System for WAT 2019 Multimodal Translation Task, and , Idiap-RR-04-2020 |
|
Idiap Submission to Swiss-German Language Detection Shared Task, , , , and , Idiap-RR-11-2020 |
|
Idiap Submission to Swiss-German Language Detection Shared Task, , , , and , in: Proceedings of the 5th Swiss Text Analytics Conference (SwissText) & 16th Conference on Natural Language Processing (KONVENS), CEUR Workshop Proceedings, 2020 |
[URL] |
INCREMENTAL SEMI-SUPERVISED LEARNING FOR MULTI-GENRE SPEECH RECOGNITION, , , , , and , in: Proceedings of ICASSP 2020, 2020 |
|
Inferring Highly-dense Representations for Clustering Broadcast Media Content, , , and , in: The Prague Bulletin of Mathematical Linguistics, 2020 |
[URL] |
Language model domain adaptation for automatic speech recognition, , and , Idiap-RR-05-2020 |
|
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition System, , , , , and , Idiap-RR-28-2020 |
|
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition System, , , , , and , in: In Proceedings of Interspeech 2020, pages 4746--4750, ISCA, 2020 |
|