Keywords:
- accent em- bedding
- Accented speech
- Accentual mismatch
- acoustic generators
- Acoustic model adaptation
- acoustic modeling
- adaptation
- ADS-B data
- air surveillance data
- Air traffic control
- air traffic control communications
- air traffic controller
- air traffic controller’s workload
- air traffic management
- Alzheimer's disease
- AM
- Anti-spoofing
- Arithmetic Coding
- Artificial intelligence
- Artificial Neural Networks
- ASR
- Assistant Based Speech Recognition
- association rules
- audio and voice analysis
- Audio Coding
- audiobook
- Automatic Speech Recognition
- automatic speech recognition and understanding
- automatic speech understanding
- batch norm
- batch normalization
- bayesian fusion
- BERT
- bias aware
- BNF
- Building Blocks
- call sign detection
- Call-sign Detection
- Call-sign Recognition
- chunking
- claim verification
- Command Prediction Model
- command recognition rate
- Confidence Measure (CM)
- Contextual Adaptation
- contextual biasing
- Convolutional Neural Networks
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- Customization of model
- data analysis
- deep learning
- Deep learning for speech
- deep MLPs
- Deep neural network
- deep neural networks
- Delays
- depression detection
- dialogue
- diarization
- direction of arrival
- direction-of-arrival estimation
- Discourse Annotation
- Discriminative features
- dnn
- DOA estimation
- domain adaptation
- dropout
- electronic flight strips
- Encoding
- end-to-end
- end-to-end ASR
- entity linking
- Entropy Coding
- Environmental mismatch
- Estimation
- F1 score
- face verification
- fact checking
- Feature extraction
- fine-tuning
- finite-state transducers
- FM
- fmllr
- Forensics
- Frequency Domain Linear Prediction (FDLP)
- gaming
- GDPR
- GMM
- GPU decoding
- Graph Convolutional Networks
- Graph Neural Networks
- high-definition video-conferencing
- HTK
- Huffman Coding
- human factors
- Human-Computer Interaction
- human-robot interaction
- hybrid system
- i-vector
- i-vectors
- Integration of prior knowledge
- Intent Classification
- inter-task fusion
- Interpretability
- Interpretable Models
- Iterative learning
- KeyWord Spotting (KWS)
- Keyword spotting detection
- KL-HMM
- knowledge distillation
- lan- guage identification
- language identification
- Language IDentification (LID)
- language modeling
- Language Models
- Language Production
- Language targets
- Large Vocabulary Continuous Speech Recognition (LVCSR)
- Lattice-Free MMI
- LEA
- legal framework
- LID
- likelihood-based encoding
- limited training data
- Linear prediction
- logistic regression
- Low resource language
- low-resource
- LVCSR
- machine learning
- Machine Translation
- Mental Lexicon
- MFCC
- microphone arrays
- Microphones
- model adaptation
- multi-face tracking
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multi-modal Approach
- multi-modal database
- multi-task
- multilingual acoustic modeling
- Multilingual automatic speech recognition
- Multimodal machine translation
- multimodal signal processing
- multiple remote tower
- multiple sound sources
- multiple speaker detection
- multitask acoustic modeling
- multitask learning
- named entity recognition
- Natural language processing
- network output
- neural nets
- neural network
- neural network-based sound source localization methods
- neural networks
- node weighted graphs
- non-native speech
- online speech recognition
- OOV-word recognition
- open-architecture distributed system
- OpenSky Network
- Operant Motive Test
- OSINT
- Out- Of-Language (OOL) detection
- out-of-domain
- Out-Of-Language (OOL) detection
- parametric speech synthesis
- parametric synthesis
- perceptual evaluation of audio quality (PEAQ)
- personal data processing
- PLDA
- Position measurement
- Psycholinguistics
- rare word recognition
- Rare-word integration
- Raw Speech
- real-time audio processing
- real-time processing
- real-time speech recognition
- recurrent neural network
- reinforcement learning
- reliability estimation
- Representation and Processing
- resources and evaluation
- Robots
- Robust Automatic Speech Recognition
- saftety
- self-supervised pre-training
- semi-supervised learning
- Semi-supervised training
- sensor fusion
- SGMM
- SGMM adaptation
- signal processing
- simultaneous detection
- single sound source
- situation awareness
- sound mixtures
- sound source localization
- spatial spectrum-based approaches
- speaker adaptation
- Speaker change detection
- speaker clustering
- Speaker identification
- speaker recognition
- speaker role classification
- speaker role detection
- speaker role identification
- speaker turn detection
- speaker verification
- Speech activity detection
- speech coding
- speech dataset
- speech decoding
- speech meta-data
- speech quality evaluations
- speech recognition
- speech synthesis
- speech understanding
- Spoken Language Understanding
- Spoken Term Detection (STD)
- Subs-ace Gaussian Mixture Models
- subspace Gaussian mixture models
- supervised adaptation
- Supervised Autoencoders
- supervision
- System Combination
- Tandem
- Text classification
- Text Representation
- text to speech
- Text-based speaker diarization
- text-to-speech
- text-to-speech synthesis
- tower utterances
- training
- transfer learning
- transformers
- TTS
- Under-resourced data
- under-resourced languages
- under-resourced speech recognition
- unsupervised learning
- user identity linkage
- verification
- Very low bit rate speech coding
- voice-activity detection
- wav2vec 2.0
- wav2vec2
- weakly-supervised learning.
- Web data
- weighted finite state transducer
- WFST
- Word Consensus Networks
- Word-Confusion-Networks
- XLS-R
Publications of Petr Motlicek sorted by journal and type
Proceedings of International Conference on Acoustics, Speech and Signal Processing (2017)
INTRA-CLASS COVARIANCE ADAPTATION IN PLDA BACK-ENDS FOR SPEAKER VERIFICATION, , , and , in: Proceedings of International Conference on Acoustics, Speech and Signal Processing, pages 5365-5369, 2017 |
[DOI] |
Proceedings of Interspeech 2017 (2017)
Semi-supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control, , , , , and , in: Proceedings of Interspeech 2017, Stockholm, Sweden, pages 2406-2410, 2017 |
[DOI] |
European Intelligence and Security Informatics Conference (EISIC) 2017 (2017)
Towards a breakthrough Speaker Identification approach for Law Enforcement Agencies: SIIP, , , , , , , , , and , in: European Intelligence and Security Informatics Conference (EISIC) 2017, Athenes, Greece, pages 32-39, IEEE Computer Society, 2017 |
[DOI] [URL] |
Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016) (2016)
DEEP NEURAL NETWORK BASED POSTERIORS FOR TEXT-DEPENDENT SPEAKER VERIFICATION, , , and , in: Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, pages 5050-5054, IEEE, 2016 |
|
INFORMATION THEORETIC CLUSTERING FOR UNSUPERVISED DOMAIN-ADAPTATION, , and , in: Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, pages 5580-5584, IEEE, 2016 |
|
Proceeedings of the INTERSPEECH (2016)
Inter-task System Fusion for Speaker Recognition, , , , and , in: Proceeedings of the INTERSPEECH, 2016 |
|
Proceedings of the International Workshop on Spoken Language Translation (2016)
Investigating Cross-lingual Multi-level Adaptive Networks: The Importance of the Correlation of Source and Target Languages, , , , and , in: Proceedings of the International Workshop on Spoken Language Translation, Seattle, WA, USA, 2016 |
|
Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016) (2016)
SYSTEM FUSION AND SPEAKER LINKING FOR LONGITUDINAL DIARIZATION OF TV SHOWS, , , and , in: Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, pages 5495-5499, IEEE, 2016 |
|
Proceedings of Interspeech (2015)
Channel Selection in the Short-time Modulation Domain for Distant Speech Recognition, , , , and , in: Proceedings of Interspeech, pages 741-745, 2015 |
|
Proceedings of ICASSP 2015 (2015)
COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION, , and , in: Proceedings of ICASSP 2015, pages 4834-4837, 2015 |
|
2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (2015)
EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION, , , and , in: 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, Brisbane, Australia, pages 4445-4449, 2015 |
[URL] |
Proceedings of Interspeech 2015 (2015)
Integrating Online I-vector extractor with Information Bottleneck based Speaker Diarization system, , , and , in: Proceedings of Interspeech 2015, pages 3105-3109, 2015 |
|
IEEE International Conference on Acoustics, Speech, and Signal Processing (2015)
Learning Feature Mapping using Deep Neural Network Bottleneck Features for Distant Large Vocabulary Speech Recognition, , , , , and , in: IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 4540-4544, 2015 |
[DOI] |
IEEE Automatic Speech Recognition and Understanding Workshop (2015)
Towards utterance-based neural network adaptation in acoustic modeling, , , and , in: IEEE Automatic Speech Recognition and Understanding Workshop, pages 289-295, 2015 |
|
Proceedings of the 15th Annual Conference of the International Speech Communication Association (Interspeech 2014) (2014)
Development of Bilingual ASR System for MediaParl Corpus, , , and , in: Proceedings of the 15th Annual Conference of the International Speech Communication Association (Interspeech 2014), Singapore, ISCA, 2014 |
|
Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (2014)
Exploiting un-transcribed foreign data for speech recognition in well-resourced languages, , , , and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, IT, pages 2322 - 2326, IEEE, 2014 |
[DOI] |
Multilingual Deep Neural Network based Acoustic Modeling For Rapid Language Adaptation, , , , , and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, pages 7639-7643, IEEE, 2014 |
[DOI] |
Interspeech 2014 (2014)
Phoneme Background Model for Information Bottleneck based Speaker Diarization, , and , in: Interspeech 2014, 2014 |
Interspeech (2014)
Phoneme Background Model for Information Bottleneck based Speaker Diarization, , and , in: Interspeech, Singapore, 2014 |
|
Stress and Accent Transmission In HMM-Based Syllable-Context Very Low Bit Rate Speech Coding, , , and , in: Interspeech, 2014 |
|
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) (2014)
The DBOX Corpus Collection of Spoken Human-Human and Human-Machine Dialogues, , , , , , , , , , , , , and , in: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, European Language Resources Association (ELRA), 2014 |
[URL] |
The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2013)
ACCENT ADAPTATION USING SUBSPACE GAUSSIAN MIXTURE MODELS, , , and , in: The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, BC, Canada, pages 7170-7174, 2013 |
[DOI] |
Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013) (2013)
Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation, , and , in: Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), ISCA - International Speech Communication Association, Lyon, France, pages 510-514, ISCA, 2013 |
|
The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2013)
FEATURE AND SCORE LEVEL COMBINATION OF SUBSPACE GAUSSIANS IN LVCSR TASK, , and , in: The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, BC, Canada, pages 7604-7608, 2013 |
[DOI] |
Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding (2013)
Impact of deep MLP architecture on different acoustic modeling techniques for under-resourced speech recognition, , , and , in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, 2013 |
|
Proceedings of the IEEE Intl. Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
On the (Un)importance of the Contextual Factors In HMM-Based Speech Synthesis, , and , in: Proceedings of the IEEE Intl. Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, pages 8140 - 8143, 2013 |
|
Proceedings of Interspeech 2012 (2012)
Annotation and Recognition of Personality Traits in Spoken Conversations from the AMI Meetings Corpus, , and , in: Proceedings of Interspeech 2012, 2012 |
|
Proceedings of the 11th International Conference on Mobile and Ubiquitous Multimedia (2012)
Assessing the Impact of Language Style on Emergent Leadership Perception from Ubiquitous Audio, , and , in: Proceedings of the 11th International Conference on Mobile and Ubiquitous Multimedia, Ulm, Germany, 2012 |
|
Proceedings of the 21st International Conference on Pattern Recognition (2012)
Bi-Modal Authentication in Mobile Environments Using Session Variability Modelling, , , , and , in: Proceedings of the 21st International Conference on Pattern Recognition, 2012 |
|
Proceedings of Interspeech (2012)
Comparing different acoustic modeling techniques for multilingual boosting, , , , and , in: Proceedings of Interspeech, Portland, Oregon, 2012 |
|
Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition, and , in: Proceedings of Interspeech, Portland, Oregon, USA, pages to appear, 2012 |
|
IEEE Content Based Multimedia Indexing (2012)
Detecting and Labeling Folk Literature in Spoken Cultural Heritage Archives using Structural and Prosodic Features, and , in: IEEE Content Based Multimedia Indexing, 2012 |
|
Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. (2012)
Generating Exact Lattices in The WFST Framework, , , , , , , , , , , , and , in: Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing., The 37th International Conference on Acoustics, Speech, and Signal Processing, Kyoto, JP, Kyoto, Japan, pages 4213-4216, IEEE Signal Processing Societ, 2012 |
[DOI] |
Actes de la conférence conjointe JEP-TALN-RECITAL 2012 (2012)
Impact du degré de supervision sur l'adaptation à un domaine d'un modèle de langage à partir du Web, , , and , in: Actes de la conference conjointe JEP-TALN-RECITAL 2012, Grenoble, France, pages 193-200, ATALA/AFCP, 2012 |
|
Proceedings on IEEE International Conference on Acoustics, Speech and Signal Processing (2012)
IMPROVING ACOUSTIC BASED KEYWORD SPOTTING USING LVCSR LATTICES, , and , in: Proceedings on IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Japan, pages 4413-4416, 2012 |
Proceedings International Conference on MultiMedia Modeling (2012)
Multimodal Cue Detection Engine for Orchestrated Entertainment, , , and , in: Proceedings International Conference on MultiMedia Modeling, Klagenfurt, Austria, 2012 |
|
Proceedings of Interspeech (2012)
Supervised and unsupervised Web-based language model domain adaptation, , , and , in: Proceedings of Interspeech, Portland, Oregon, USA, pages to appear, 2012 |
|
Proceedings IEEE International Conference on Multimedia & Expo (2011)
Just-in-Time Multimodal Association and Fusion from Home Entertainment, , , and , in: Proceedings IEEE International Conference on Multimedia & Expo, Barcelona, Spain, 2011 |
|
Proceedings of International Conference on Acoustics, Speech and Signal Processing (2011)
MULTISTREAM SPEAKER DIARIZATION THROUGH INFORMATION BOTTLENECK SYSTEM OUTPUTS COMBINATION, , and , in: Proceedings of International Conference on Acoustics, Speech and Signal Processing, 2011 |
|
Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011)
Speaker Diarization of Meetings based on Speaker Role N-gram Models, , and , in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011 |
|
IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)
The Kaldi Speech Recognition Toolkit, , , , , , , , , , , , and , in: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Hilton Waikoloa Village, Big Island, Hawaii, US, IEEE Signal Processing Society, 2011 |
|
International Conference on Signal Acquisition and Processing (2011)
The TA2 Database - A Multi-Modal Database from Home Entertainment, , and , in: International Conference on Signal Acquisition and Processing, Singapore, 2011 |
|
2010 IEEE International Conference on Acoustics, Speech and Signal Processing (2010)
Application of Out-Of-Language Detection To Spoken-Term Detection, and , in: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, USA, 2010 |
|
Proceedings of Interspeech, Makuhari, Japan, 2010 (2010)
English Spoken Term Detection in Multilingual Recordings, , and , in: Proceedings of Interspeech, Makuhari, Japan, 2010, ISCA, Makuhari, Japan, 2010 |
|
Proceedings of Interspeech (2010)
Hands Free Audio Analysis from Home Entertainment, , and , in: Proceedings of Interspeech, Makuhari, Japan, 2010 |
|
Proceedings of ICASSP (2010)
VARIATIONAL BAYESIAN SPEAKER DIARIZATION OF MEETING RECORDINGS, , and , in: Proceedings of ICASSP, 2010 |
|
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009, WASPAA '09. (2009)
APPLICATIONS OF SIGNAL ANALYSIS USING AUTOREGRESSIVE MODELS FOR AMPLITUDE MODULATION, , , and , in: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009, WASPAA '09., IEEE, Mohonk Mountain House, New Paltz, New York, USA, 2009 |
[URL] |
10th Annual Conference of the International Speech Communication Association (2009)
Arithmetic Coding of Sub-Band Residuals in FDLP Speech/Audio Codec, , and , in: 10th Annual Conference of the International Speech Communication Association, ISCA, Brighton, England, ISCA 2009, 2009 |
|
12th International Conference on Text, Speech and Dialogue, TSD 2009 (2009)
Error Resilient Speech Coding Using Sub-band Hilbert Envelopes, , and , in: 12th International Conference on Text, Speech and Dialogue, TSD 2009, Pilsen, Czech Republic, Springer - Verlag, Berlin Heidelberg 2009, 2009 |
|
Error Resilient Speech Coding Using Sub-band Hilbert Envelopes, , and , in: 12th International Conference on Text, Speech and Dialogue, TSD 2009, Pilsen, Czech Republic, Springer - Verlag, Berlin Heidelberg 2009, 2009 |
|