Keywords:
- accent embedding
- Accented speech
- Accentual mismatch
- acoustic generators
- Acoustic model adaptation
- acoustic modeling
- adaptation
- ADS-B data
- air surveillance data
- Air traffic control
- air traffic control communications
- air traffic controller
- air traffic controller’s workload
- air traffic management
- Alzheimer's disease
- AM
- Anti-spoofing
- Arithmetic Coding
- Artificial intelligence
- Artificial Neural Networks
- ASR
- Assistant Based Speech Recognition
- association rules
- audio and voice analysis
- Audio Coding
- audiobook
- Automatic Speech Recognition
- automatic speech recognition and understanding
- automatic speech understanding
- batch norm
- batch normalization
- bayesian fusion
- BERT
- bias aware
- BNF
- Building Blocks
- call sign detection
- Call-sign Detection
- Call-sign Recognition
- chunking
- claim verification
- Command Prediction Model
- command recognition rate
- Confidence Measure (CM)
- Contextual Adaptation
- contextual biasing
- conversational modeling
- Convolutional Neural Networks
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- Customization of model
- data analysis
- deep learning
- Deep learning for speech
- deep MLPs
- Deep neural network
- deep neural networks
- Delays
- depression detection
- dialogue
- diarization
- direction of arrival
- direction-of-arrival estimation
- Discourse Annotation
- Discriminative features
- dnn
- DOA estimation
- domain adaptation
- dropout
- electronic flight strips
- Encoding
- end-to-end
- end-to-end ASR
- entity linking
- Entropy Coding
- Environmental mismatch
- Estimation
- F1 score
- face verification
- fact checking
- Feature extraction
- fine-tuning
- finite-state transducers
- FM
- fmllr
- Forensics
- Frequency Domain Linear Prediction (FDLP)
- gaming
- GDPR
- GMM
- GPU decoding
- Graph Convolutional Networks
- Graph Neural Networks
- high-definition video-conferencing
- HTK
- Huffman Coding
- human factors
- Human-Computer Interaction
- human-robot interaction
- hybrid system
- i-vector
- i-vectors
- Integration of prior knowledge
- Intent Classification
- inter-task fusion
- Interpretability
- Interpretable Models
- Iterative learning
- KeyWord Spotting (KWS)
- Keyword spotting detection
- KL-HMM
- knowledge distillation
- lan- guage identification
- language identification
- Language IDentification (LID)
- language modeling
- Language Models
- Language Production
- Language targets
- Large Language Models
- Large Vocabulary Continuous Speech Recognition (LVCSR)
- Lattice-Free MMI
- LEA
- legal framework
- LID
- likelihood-based encoding
- limited training data
- Linear prediction
- logistic regression
- Low resource language
- low-resource
- LVCSR
- machine learning
- Machine Translation
- Mental Lexicon
- MFCC
- microphone arrays
- Microphones
- model adaptation
- multi-face tracking
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multi-modal Approach
- multi-modal database
- multi-task
- multilingual acoustic modeling
- Multilingual automatic speech recognition
- Multimodal machine translation
- multimodal signal processing
- multiple remote tower
- multiple sound sources
- multiple speaker detection
- multitask acoustic modeling
- multitask learning
- multitask training
- named entity recognition
- Natural language processing
- network output
- neural nets
- neural network
- neural network-based sound source localization methods
- neural networks
- node weighted graphs
- non-native speech
- online speech recognition
- OOV-word recognition
- open-architecture distributed system
- OpenSky Network
- Operant Motive Test
- OSINT
- Out- Of-Language (OOL) detection
- out-of-domain
- Out-Of-Language (OOL) detection
- parametric speech synthesis
- parametric synthesis
- perceptual evaluation of audio quality (PEAQ)
- personal data processing
- PLDA
- Position measurement
- pseudo-labelling
- Psycholinguistics
- rare word recognition
- Rare-word integration
- Raw Speech
- real-time audio processing
- real-time processing
- real-time speech recognition
- recurrent neural network
- reinforcement learning
- reliability estimation
- Representation and Processing
- resources and evaluation
- Robots
- Robust Automatic Speech Recognition
- ROXANNE
- ROXSD
- saftety
- self-supervised pre-training
- semi-supervised learning
- Semi-supervised training
- sensor fusion
- sentence embeddings
- Sentiment Analysis
- SGMM
- SGMM adaptation
- shallow fusion
- signal processing
- simultaneous detection
- single sound source
- situation awareness
- sound mixtures
- sound source localization
- spatial spectrum-based approaches
- speaker adaptation
- Speaker change detection
- speaker clustering
- Speaker identification
- speaker recognition
- speaker role classification
- speaker role detection
- speaker role identification
- speaker turn detection
- speaker verification
- Speech activity detection
- speech coding
- speech dataset
- speech decoding
- speech meta-data
- speech quality evaluations
- speech recognition
- speech synthesis
- speech understanding
- spoken dialogue systems
- Spoken Language Understanding
- Spoken Term Detection (STD)
- streaming transducer
- Subs-ace Gaussian Mixture Models
- subspace Gaussian mixture models
- supervised adaptation
- Supervised Autoencoders
- supervision
- System Combination
- Tandem
- task-oriented dialog
- Text classification
- Text Representation
- text to speech
- Text-based speaker diarization
- text-to-speech
- text-to-speech synthesis
- tower utterances
- TRACY · Law Enforcement Agencies · Suspect Detection· Non-Content Data· Social Influence Analysis· Link Prediction
- training
- transfer learning
- transformers
- TTS
- Under-resourced data
- under-resourced languages
- under-resourced speech recognition
- unsupervised learning
- user identity linkage
- verification
- Very low bit rate speech coding
- voice-activity detection
- wav2vec 2.0
- wav2vec2
- weakly-supervised learning.
- Web data
- weighted finite state transducer
- WFST
- Word Consensus Networks
- Word-Confusion-Networks
- XLS-R
- XLSR-Transducer
Publications of Petr Motlicek sorted by journal and type
Publications of type Idiap-RR
2012
Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios, , , and , Idiap-RR-20-2012 |
|
Bi-Modal Authentication in Mobile Environments Using Session Variability Modelling, , , , and , Idiap-RR-18-2012 |
|
Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition, and , Idiap-RR-21-2012 |
|
Impact du degré de supervision sur l'adaptation à un domaine d'un modèle de langage à partir du Web, , , and , Idiap-RR-23-2012 |
|
IMPROVING ACOUSTIC BASED KEYWORD SPOTTING USING LVCSR LATTICES, , and , Idiap-RR-36-2012 |
|
Progress report of a project in very low bit-rate speech coding, , and , Idiap-RR-08-2012 |
|
Supervised and unsupervised Web-based language model domain adaptation, , , and , Idiap-RR-22-2012 |
|
The Kaldi Speech Recognition Toolkit, , , , , , , , , , , , and , Idiap-RR-04-2012 |
|
2011
Just-in-Time Multimodal Association and Fusion from Home Entertainment, , , and , Idiap-RR-10-2011 |
|
Multimodal Cue Detection Engine for Orchestrated Entertainment, , , and , Idiap-RR-34-2011 |
|
2010
AMIDA/Klewel Mini-Project, , , and , Idiap-RR-03-2010 |
|
Application of Out-Of-Language Detection To Spoken-Term Detection, and , Idiap-RR-04-2010 |
|
English Spoken Term Detection in Multilingual Recordings, , and , Idiap-RR-21-2010 |
|
Hands Free Audio Analysis from Home Entertainment, , and , Idiap-RR-27-2010 |
|
The DIRAC AWEAR Audio-Visual Platform for Detection of Unexpected and Incongruent Events, , , , , , , , , , , , , and , Idiap-RR-41-2010 |
|
The TA2 Database - A Multi-Modal Database from Home Entertainment, , and , Idiap-RR-37-2010 |
|
2009
APPLICATIONS OF SIGNAL ANALYSIS USING AUTOREGRESSIVE MODELS FOR AMPLITUDE MODULATION, , , and , Idiap-RR-35-2009 |
|
Automatic Out-of-Language Detection based on Confidence Measures derived from LVCSR Word and Phone Lattices, , Idiap-RR-06-2009 |
|
Autoregressive Models of Amplitude Modulations in Audio Compression, , and , Idiap-RR-33-2009 |
|
MDCT for Encoding Residual Signals in Frequency Domain Linear Prediction, , and , Idiap-RR-34-2009 |
|
2008
Autoregressive Modelling of Hilbert Envelopes for Wide-band Audio Coding, , , and , Idiap-RR-40-2008 |
|
Entropy coding of Quantized Spectral Components in FDLP audio codec, , and , Idiap-RR-71-2008 |
|
Exploiting temporal context for speech/non-speech detection, , and , Idiap-RR-21-2008 |
|
Low-Delay Error Resilient Speech Coding Using Sub-band Hilbert Envelopes, , and , Idiap-RR-75-2008 |
|
MODIFIED DISCRETE COSINE TRANSFORM FOR ENCODING RESIDUAL SIGNALS IN FREQUENCY DOMAIN LINEAR PREDICTION, , and , Idiap-RR-74-2008 |
|
Spectral Noise Shaping: Improvements in Speech/Audio Codec Based on Linear Prediction in Spectral Domain, , , and , Idiap-RR-16-2008 |
|
2007
LP-TRAPs in all senses, , Idiap-RR-66-2007 |
|
Non-uniform QMF Decomposition for Wide-band Audio Coding based on Frequency Domain Linear Prediction, , , and , Idiap-RR-43-2007 |
|
Scalable Wide-band Audio Codec based on Frequency Domain Linear Prediction, , , and , Idiap-RR-16-2007 |
|
Temporal Masking for Bit-rate Reduction in Audio Codec Based on Frequency Domain Linear Prediction, , , and , Idiap-RR-48-2007 |
|
2006
Audio Coding Based on Long Temporal Contexts, , , and , Idiap-RR-30-2006 |
|
Audio Coding Based on Long Temporal Segments: Experiments With Quantization of Excitation Signal, and , Idiap-RR-46-2006 |
|
Speech Coding based on Spectral Dynamics, , , and , Idiap-RR-05-2006 |
|
Unsupervised Speech/Non-speech Detection for Automatic Speech Recognition in Meeting Rooms, , and , Idiap-RR-57-2006 |
|
Wide-Band Perceptual Audio Coding based on Frequency-Domain Linear Prediction, , and , Idiap-RR-58-2006 |
|
2009
Wide-Band Audio Coding based on Frequency Domain Linear Prediction, , and , Idiap-RR-32-2009 |
|
Publications of type Idiap-Com
2022
Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction, , and , Idiap-Com-03-2022 |
[URL] |
2019
Adaptation of Multiple Sound Source Localization Neural Networks with Weak Supervision and Domain-Adversarial Training, , and , Idiap-Com-01-2019 |
|
2016
Integration of Real-Time Speech Processing Technologies for Online Gaming, , and , Idiap-Com-01-2016 |
|
2013
Who Wants To Be A Millionaire? (II), , and , Idiap-Com-02-2013 |
|
2012
Who Wants To Be A Millionaire?, , , and , Idiap-Com-03-2012 |
|
2011
Domain-specific language model adaptation: a case study, , and , Idiap-Com-01-2013 |
|
Advances in Multimedia
Real-Time Audio-Visual Analysis for Multiperson Videoconferencing, , , , , , , , and , in: Advances in Multimedia, 2013:21, 2013 |
[DOI] [URL] |
Aerospace
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers, , , , and , in: Aerospace, 10(5), 2023 |
[DOI] [URL] |
An Automatic Speaker Clustering Pipeline for the Air Traffic Communication Domain, , , , , , and , in: Aerospace, 10(10):876, 2023 |
[DOI] [URL] |
Assistant Based Speech Recognition Support for Air Traffic Controllers in a Multiple Remote Tower Environment, , , , , , , , , , , , and , in: Aerospace, 10(6), 2023 |
[DOI] [URL] |
Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding, , , , , , , , , , and , in: Aerospace, 10(10):898, 2023 |
[DOI] [URL] |
Association for Computational Linguistics
Claim-Dissector: An Interpretable Fact-Checking System with Joint Re-ranking and Veracity Prediction, , and , in: Association for Computational Linguistics, Findings of the Association for Computational Linguistics: ACL 2023:10184–10205, 2023 |
[URL] |
Cognitive Computation
Applying Attention-Based Models for Detecting Cognitive Processes and Mental Health Conditions, , , and , in: Cognitive Computation:18, 2021 |
[DOI] [URL] |
EURASIP Journal on Audio Speech and Music Processing
Wide-Band Audio Coding based on Frequency Domain Linear Prediction, , , and , in: EURASIP Journal on Audio Speech and Music Processing, 2010(856280), 2010 |
[DOI] [URL] |