Keywords:
- Acoustic features
- acoustic modeling
- Adaboost
- Alzheimer's disease
- Anti-spoofing
- articulatory features
- Artificial Neural Networks
- ASR
- atypical speech
- audio deepfake
- Automatic accent assessment
- Automatic accent evaluation
- automatic gender recognition
- Automatic speaker verification (ASV)
- Automatic Speech Recognition
- automatic subword unit derivation
- bag of audio words
- bandwidth
- Benchmarking
- Binary features
- binary masking
- bioacoustics
- Blizzard Challenge
- BoAW
- boosting
- breathing pattern estimation
- breathing patterns
- call type classification
- call-type and caller classification
- Children speech recognition
- Classification
- CNN visualization
- ComParE features
- computational efficiency
- Conditional Random Fields
- confidence measures
- continuous speech recognition boosted binary features resource management
- Convolution Neural Network
- Convolutional neural network
- Convolutional Neural Networks
- COVID-19 identification
- cross-database
- cross-transfer knowledge
- Customer satisfaction
- deep learning
- deep neural networks
- deepfake detection
- depression detection
- detection
- Direction of arrival estimation
- discretization
- domain adaptation
- dynamic programming
- Dysarthria
- Dysarthric speech
- embedding
- Emotion Recognition
- emotional prosody
- end-to-end acoustic modeling
- End-to-end learning
- end-to-end modelling
- end-to-end training
- expected performance and spoofability curve
- Expressive Vocalizations
- feature representations
- feature selection
- Few-shot learning
- fine-tuning
- Finetuning
- fixed-size word patterns
- Formant identification
- Formants
- Foundation Model
- Foundation Models
- Fundamental frequency
- Fusion
- Gaussian mixture
- glottal source signals.
- grapheme
- Grapheme subword units
- grapheme subwords
- grapheme-to- phoneme conversion
- grapheme-to-phoneme conversion
- grapheme-to-phoneme converter
- Graphemes
- Hidden Markov Model
- hidden Markov models
- human skeleton estimation
- human speech
- hypoglycemia
- integration of ASV and anti-spoofing
- Inter-pretable Models
- Interpretable features
- isolated word recognition
- Kalman filters
- KL-divergence
- KL-HMM
- Kullback-Leibler divergence
- Kullback-Leibler divergence based hidden Markov model
- Kullback-Leibler divergence based HMM
- Kullback–Leibler divergence based hidden Markov model
- language disorder
- Language Production
- Large Language Models
- leaderboard
- letter-to-sound rules
- lexical model
- Lexical modeling
- Lexicon
- local posterior probability
- localization
- long-term statistics
- LoRA
- low level descriptors
- Low resource language
- machine learning
- Mental Lexicon
- microphone array
- microphone arrays
- mobile biometrics
- modalities fusion
- modified ZFF
- multi- layer perceptron
- Multi-modal Approach
- multi-stream combination
- Multi-task learning
- multilayer perceptron
- multilayer perceptron network
- Multilingual
- multilingual acoustic modeling
- Multimodal
- multiple linear regression
- Multiple speaker localization
- multiple speakers
- multiple-stream combination
- multitask learning
- neural network
- neurocomputational models
- Noise Robustness
- non-native speech
- non-native speech recognition
- noninvasive
- Objective Evaluation
- Objective intelligibility
- Objective intelligibility Assessment
- objective measures
- overlapping speech recognition
- Paralinguistic speech processing
- Parkinson's disease
- Parkinson's disease detection
- Parkinson’s disease
- parts-based approach
- Pathological speech
- Pathological Speech Processing
- PC-GITA
- Peft
- Perceived fluency
- phoneme
- phoneme modeling
- Phoneme recognition
- phoneme subword units
- phoneme subwords
- phonemes
- Phonetic information
- phonetic representation
- Phonocardiogram
- Posterior features
- posterior probabilities
- pre-trained embedding
- pre-training domain
- predictive coding
- presentation attack
- Presentation Attack Detection
- probabilistic lexical modeling
- pronunciation generation
- pronunciation lexicon
- quantization
- Raw Speech
- raw waveform modelling
- raw waveforms
- raw-waveform cnn
- Reading Assessment
- recognition
- recurrent neural network
- Respiratory parameters
- S1-S2 detection
- Scottish Gaelic
- segment-level training.
- Self-Organizing Maps
- Self-supervised embedding
- self-supervised learning
- sentence mode prediction
- sign language assessment
- Sign language processing
- signal processing
- sleepiness
- speaker verification
- speaker-specific features
- spectral statistics
- Speech Analysis
- speech and audio
- speech assessment
- Speech breathing
- Speech Emotion Recognition
- Speech enhancement
- Speech for health
- Speech Foundation Models
- Speech in health
- Speech intelligibility
- speech pathology detection
- speech recognition
- speech recognition.
- speech separation
- speech synthesis
- Speech technology
- Spoken Language Understanding
- Spoofing
- spoofing detection
- Steered response power
- String matching
- SVM
- syllable-level-features
- syllables
- synthetic reference templates.
- Synthetic speech
- TANDEM features
- template-based approach
- template-based system
- Text classification
- text-to-speech synthesis
- token sequences
- tracking
- under-resource speech recognition
- under-resourced languages
- universal phoneme set
- unsupervised adaptation
- utterance verification
- voice
- voice activity detection
- Voice Conversion
- wav2vec2.0
- zero frequency filter
- Zero frequency filtering
- zero-frequency filtering
- zero-resourced speech recognition
- Zero-shot Speech Synthesis
Publications of Mathew Magimai-Doss sorted by journal and type
Publications of type Idiap-RR
2005
| Using Auxiliary Sources of Knowledge for Automatic Speech Recognition, , Idiap-RR-90-2005 |
|
2004
| A Sector-Based, Frequency-Domain Approach to Detection and Localization of Multiple Speakers, and , Idiap-RR-54-2004 |
|
| HMM/ANN Based Spectral Peak Location Estimation for Noise Robust Speech Recognition, , and , Idiap-RR-50-2004 |
|
| Modelling Auxiliary Features in Tandem Systems, , , and , Idiap-RR-21-2004 |
|
| On the Adequacy of Baseform Pronunciations and Pronunciation Variants, and , Idiap-RR-27-2004 |
|
| Phoneme vs Grapheme Based Automatic Speech Recognition, , , and , Idiap-RR-48-2004 |
|
| Spectro-Temporal Activity Pattern (STAP) Features for Noise Robust ASR, , , and , Idiap-RR-20-2004 |
|
| Towards using hierarchical posteriors for flexible automatic speech recognition systems, , , , , and , Idiap-RR-58-2004 |
|
2003
| Joint Decoding for Phoneme-Grapheme Continuous Speech Recognition, , and , Idiap-RR-52-2003 |
|
| Phoneme-Grapheme Based Speech Recognition System, , , and , Idiap-RR-37-2003 |
|
| Using pitch frequency information in speech recognition, , and , Idiap-RR-23-2003 |
|
2002
| Auxiliary Variables in Conditional Gaussian Mixtures for Automatic Speech Recognition, , and , Idiap-RR-25-2002 |
|
| Dynamic Bayesian Network Based Speech Recognition with Pitch and Energy as Auxiliary Variables, , , and , Idiap-RR-24-2002 |
|
| Modelling auxiliary information (pitch frequency) in hybrid HMM/ANN based ASR systems, , and , Idiap-RR-62-2002 |
|
| Speech recognition of spontaneous, noisy speech using auxiliary information in Bayesian networks, , and , Idiap-RR-44-2002 |
| Speech recognition with auxiliary information, , and , Idiap-RR-58-2002 |
2001
| Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition, , and , Idiap-RR-45-2001 |
|
| Modeling Auxiliary Information in Bayesian Network Based ASR, , and , Idiap-RR-11-2001 |
|
| Pronunciation models and their evaluation using confidence measures, and , Idiap-RR-29-2001 |
|
2000
| Automatic Speech Recognition using Pitch Information in Dynamic Bayesian Networks, , and , Idiap-RR-41-2000 |
|
Bioacoustics: The International Journal of Animal Sound and its Recording
| On feature representations for marmoset vocal communication analysis, , , , and , in: Bioacoustics: The International Journal of Animal Sound and its Recording:1-15, 2025 |
[DOI] [URL] |
Computer Speech and Language
| Articulatory feature based continuous speech recognition using probabilistic lexical modeling, and , in: Computer Speech and Language, 36:233-259, 2016 |
[DOI] |
Computer, Speech & Language
| Adjustable Deterministic Pseudonymization of Speech, , and , in: Computer, Speech & Language, 72, 2022 |
[DOI] |
Diabetes Care
| Listening to Hypoglycemia: Voice as a Biomarker for Detection of a Medical Emergency Using Machine Learning, , , , , , , , , , , and , in: Diabetes Care, 2025 |
[DOI] |
IEEE Open Journal of Signal Processing
| Posterior-based analysis of spatio-temporal features for Sign Language Assessment, , , , and , in: IEEE Open Journal of Signal Processing, 2025 |
[DOI] |
IEEE Signal Processing Letters
| Utterance Verification-based Dysarthric Speech Intelligibility Assessment using Phonetic Posterior Features, and , in: IEEE Signal Processing Letters, 28:224 - 228, 2021 |
[DOI] |
| A Posterior-Based Multi-Stream Formulation for G2P Conversion, and , in: IEEE Signal Processing Letters, 2017 |
|
| A Savitzky-Golay Filtering Perspective of Dynamic Feature Computation, , and , in: IEEE Signal Processing Letters, 20(3):281 -- 284, 2013 |
[DOI] |
IEEE Trans. on Speech and Audio Processing
| Speech recognition with auxiliary information, , and , in: IEEE Trans. on Speech and Audio Processing, 4, 2004 |
IEEE Transactions on Audio, Speech, and Language Processing
| Applying multi- and cross-lingual stochastic phone space transformations to non-native speech recognition, , , , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 2013 |
[DOI] |
| Privacy-Sensitive Audio Features for Speech/Nonspeech Detection, , , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2011 |
|
| Transcribing Mandarin Broadcast Speech Using Multi-Layer Perceptron Acoustic Features, , , , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2011 |
[DOI] |
IEEE Transactions on Information Forensics and Security
| On Joint Optimization of Automatic Speaker Verification and Anti-spoofing in the Embedding Space, , , , and , in: IEEE Transactions on Information Forensics and Security, 16:1579--1593, 2021 |
[DOI] |
| A Fast Parts-based Approach to Speaker Verification using Boosted Slice Classifiers, , and , in: IEEE Transactions on Information Forensics and Security, 7(1):241-254, 2012 |
|
IEEE Transcations on Audio, Speech, and Language Processing
| Analysis of MLP Based Hierarchical Phoneme Posterior Probability Estimator, , , , and , in: IEEE Transcations on Audio, Speech, and Language Processing, 19(2):225-241, 2011 |
|
IEEE/ACM Transactions on Audio, Speech and Language Processing
| Long-Term Spectral Statistics for Voice Presentation Attack Detection, , , and , in: IEEE/ACM Transactions on Audio, Speech and Language Processing, 25(11):2098-2111, 2017 |
|
Information
| Subunits Inference and Lexicon Development Based on Pairwise Comparison of Utterances and Signs, and , in: Information, 10:298, 2019 |
[DOI] [URL] |
Neural Networks
| Deep learning architectures for estimating breathing signal and respiratory parameters from speech recordings, , , , and , in: Neural Networks, 141:211--224, 2021 |
[DOI] |
PLoS Computational Biology
| Signal-to-signal neural networks for improved spike estimation from calcium imaging data, , , and , in: PLoS Computational Biology, 17(3):1--19, 2021 |
[DOI] |
PLoS One
| Measuring negative emotions and stress through acoustic correlates in speech: A systematic review, , , and , in: PLoS One, 20(7), 2025 |
[DOI] |
Sadhana
| Current trends in multilingual speech processing, , , , , , , , and , in: Sadhana, 36(5):885–915, 2011 |
[DOI] [URL] |
Scientific reports
| Multidisciplinary characterization of embarrassment through behavioral and acoustic modeling, , , , , and , in: Scientific reports, 2025 |
|
Speech Communication
| End-to-End Acoustic Modeling using Convolutional Neural Networks for HMM-based Automatic Speech Recognition, , and , in: Speech Communication, 108:15--32, 2019 |
[DOI] |
| Towards Weakly Supervised Acoustic Subword Unit Discovery and Lexicon Development Using Hidden Markov Models, , and , in: Speech Communication, 96:168-183, 2018 |
[DOI] |
| Acoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework, , and , in: Speech Communication, 80, 2016 |
[DOI] |
| Acoustic and Lexical Resource Constrained ASR using Language-Independent Acoustic Model and Language-Dependent Probabilistic Lexical Model, and , in: Speech Communication, 68:23–40, 2015 |
[DOI] [URL] |
| Phase AutoCorrelation (PAC) features for noise robust speech recognition, , , and , in: Speech Communication, 54(7):867–880, 2012 |
[DOI] |
The Phonetician
| On Learning Grapheme-to-Phoneme Relationships through the Acoustic Speech Signal, and , in: The Phonetician, 109–110:6-23, 2014 |
|
Automatic Assessment of Parkinsonian Speech (2025)
| On Detection of Depression in Parkinson's Disease Patients' Speech: Handcrafted Features vs. Speech Foundation Models, , , and , in: Automatic Assessment of Parkinsonian Speech, Springer Nature Switzerland AG, 2025 |
[URL] |
Interactive Multimodal Information Management (2013)
| Speech Processing, , in: Interactive Multimodal Information Management, pages 221--245, EPFL Press, 2013 |