Keywords:
- acoustic modeling
- Adaboost
- Alzheimer's disease
- Anti-spoofing
- articulatory features
- Artificial Neural Networks
- atypical speech
- Automatic accent assessment
- Automatic accent evaluation
- automatic gender recognition
- Automatic speaker verification (ASV)
- Automatic Speech Recognition
- automatic subword unit derivation
- bag of audio words
- bandwidth
- Binary features
- binary masking
- bioacoustics
- BoAW
- boosting
- breathing pattern estimation
- breathing patterns
- call type classification
- call-type and caller classification
- Children speech recognition
- Classification
- CNN visualization
- ComParE features
- computational efficiency
- Conditional Random Fields
- confidence measures
- continuous speech recognition boosted binary features resource management
- Convolution Neural Network
- Convolutional neural network
- Convolutional Neural Networks
- COVID-19 identification
- cross-database
- cross-transfer knowledge
- Customer satisfaction
- deep learning
- deep neural networks
- depression detection
- Direction of arrival estimation
- dynamic programming
- Dysarthria
- Dysarthric speech
- embedding
- Emotion Recognition
- end-to-end acoustic modeling
- End-to-end learning
- end-to-end modelling
- end-to-end training
- expected performance and spoofability curve
- Expressive Vocalizations
- feature representations
- feature selection
- Few-shot learning
- fine-tuning
- fixed-size word patterns
- Formant identification
- Formants
- Foundation Model
- Fundamental frequency
- Fusion
- Gaussian mixture
- glottal source signals.
- grapheme
- Grapheme subword units
- grapheme subwords
- grapheme-to- phoneme conversion
- grapheme-to-phoneme conversion
- grapheme-to-phoneme converter
- Graphemes
- Hidden Markov Model
- hidden Markov models
- human skeleton estimation
- human speech
- integration of ASV and anti-spoofing
- Inter-pretable Models
- isolated word recognition
- Kalman filters
- KL-divergence
- KL-HMM
- Kullback-Leibler divergence
- Kullback-Leibler divergence based hidden Markov model
- Kullback-Leibler divergence based HMM
- Kullback–Leibler divergence based hidden Markov model
- language disorder
- Language Production
- Large Language Models
- letter-to-sound rules
- lexical model
- Lexical modeling
- Lexicon
- local posterior probability
- localization
- long-term statistics
- LoRA
- low level descriptors
- Mental Lexicon
- microphone array
- microphone arrays
- mobile biometrics
- modalities fusion
- modified ZFF
- multi- layer perceptron
- Multi-modal Approach
- multi-stream combination
- Multi-task learning
- multilayer perceptron
- multilayer perceptron network
- multilingual acoustic modeling
- multiple linear regression
- Multiple speaker localization
- multiple speakers
- multiple-stream combination
- multitask learning
- neural network
- neurocomputational models
- Noise Robustness
- non-native speech
- non-native speech recognition
- Objective Evaluation
- Objective intelligibility
- Objective intelligibility Assessment
- objective measures
- overlapping speech recognition
- Paralinguistic speech processing
- Parkinson's disease
- Parkinson's disease detection
- Parkinson’s disease
- parts-based approach
- Pathological speech
- Pathological Speech Processing
- Peft
- Perceived fluency
- phoneme
- phoneme modeling
- Phoneme recognition
- phoneme subword units
- phoneme subwords
- phonemes
- Phonetic information
- phonetic representation
- Phonocardiogram
- Posterior features
- posterior probabilities
- pre-trained embedding
- pre-training domain
- predictive coding
- presentation attack
- Presentation Attack Detection
- probabilistic lexical modeling
- pronunciation generation
- pronunciation lexicon
- Raw Speech
- raw waveform modelling
- raw waveforms
- raw-waveform cnn
- Reading Assessment
- recognition
- recurrent neural network
- Respiratory parameters
- S1-S2 detection
- Scottish Gaelic
- segment-level training.
- Self-Organizing Maps
- Self-supervised embedding
- self-supervised learning
- sign language assessment
- Sign language processing
- signal processing
- sleepiness
- speaker verification
- speaker-specific features
- spectral statistics
- Speech Analysis
- speech and audio
- speech assessment
- Speech breathing
- Speech Emotion Recognition
- Speech enhancement
- Speech for health
- Speech intelligibility
- speech pathology detection
- speech recognition
- speech recognition.
- speech separation
- speech synthesis
- Speech technology
- Spoken Language Understanding
- Spoofing
- spoofing detection
- Steered response power
- String matching
- SVM
- syllable-level-features
- syllables
- synthetic reference templates.
- Synthetic speech
- TANDEM features
- template-based approach
- template-based system
- Text classification
- text-to-speech synthesis
- tracking
- under-resource speech recognition
- under-resourced languages
- universal phoneme set
- unsupervised adaptation
- utterance verification
- voice activity detection
- Voice Conversion
- zero frequency filter
- Zero frequency filtering
- zero-frequency filtering
- zero-resourced speech recognition
Publications of Mathew Magimai.-Doss sorted by journal and type
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2013)
A Probabilistic Framework for Multiple Speaker Localization, , , and , in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013 |
|
Proceedings of Interspeech (2013)
Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal using Convolutional Neural Networks, , and , in: Proceedings of Interspeech, 2013 |
|
Proceedings of IEEE TENCON (2013)
Gammatone Wavelet Cepstral Coefficients for Robust Speech Recognition, , and , in: Proceedings of IEEE TENCON, 2013 |
|
IEEE International Conference on Acoustics, Speech and Signal Processing (2013)
Grapheme and Multilingual Posterior Features for Under-Resourced Speech Recognition: A Study on Scottish Gaelic, , and , in: IEEE International Conference on Acoustics, Speech and Signal Processing, 2013 |
|
Proceedings of Interspeech (2013)
Improving Grapheme-based ASR by Probabilistic Lexical Modeling Approach, and , in: Proceedings of Interspeech, 2013 |
|
Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding (2013)
Probabilistic Lexical Modeling and Unsupervised Training for Zero-Resourced ASR, , and , in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, 2013 |
|
20th European Signal Processing Conference (2012)
A TDOA Gaussian Mixture Model for Improving Acoustic Source Tracking, , , and , in: 20th European Signal Processing Conference, 2012 |
|
Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (2012)
Acoustic Data-driven Grapheme-to-Phoneme Conversion using KL-HMM, and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2012 |
|
Symposium on Machine Learning in Speech and Language Processing (MLSLP) (2012)
Boosting localized binary features for speech recognition, , and , in: Symposium on Machine Learning in Speech and Language Processing (MLSLP), 2012 |
|
Proceedings of Interspeech (2012)
Combination of Sparse Classification and Multilayer Perceptron for Noise Robust ASR, , , , , and , in: Proceedings of Interspeech, 2012 |
|
Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation, and , in: Proceedings of Interspeech, Portland, Oregon, 2012 |
|
Statistical and Perceptual Audition Workshop (2012)
Joint Detection and Localization of Multiple Speakers using a Probabilistic Interpretation of the Steered Response Power, , , and , in: Statistical and Perceptual Audition Workshop, 2012 |
|
Proceedings of Interspeech (2012)
Synthetic References for Template-based ASR using Posterior Features, , and , in: Proceedings of Interspeech, Portland, Oregon, USA, 2012 |
|
SAPA-SCALE Conference, International Speech Communication Association (2012)
Template-based ASR using Posterior features and synthetic references: comparing different TTS systems, , and , in: SAPA-SCALE Conference, International Speech Communication Association, 2012 |
|
Proceedings of Interspeech (2012)
Using Sparse Classification Outputs as Feature Observations for Noise Robust ASR, , , , , and , in: Proceedings of Interspeech, 2012 |
|
Proceedings of Interspeech 2011 (2011)
Analysis and Comparison of Recent MLP Features for LVCSR Systems, , and , in: Proceedings of Interspeech 2011, 2011 |
|
Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding (2011)
Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition, , and , in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, Hawaii, USA, pages 348-353, 2011 |
|
IAPR IEEE International Joint Conference on Biometrics (2011)
Fast Speaker Verification on Mobile Phone data using Boosted Slice Classifiers, , and , in: IAPR IEEE International Joint Conference on Biometrics, Washington DC, 2011 |
|
Proceedings of Interspeech (2011)
Grapheme-based Automatic Speech Recognition using KL-HMM, , , and , in: Proceedings of Interspeech, 2011 |
|
Hierarchical Tandem Features for ASR in Mandarin, , and , in: Proceedings of Interspeech, 2011 |
Artificial Neural Networks and Machine Learning - ICANN 2011 (2011)
Improving Articulatory Feature and Phoneme Recognition using Multitask Learning, and , in: Artificial Neural Networks and Machine Learning - ICANN 2011, pages 299-306, Springer Berlin / Heidelberg, 2011 |
[DOI] [URL] |
Proceedings of Interspeech (2011)
Improving non-native ASR through stochastic multilingual phoneme space transformations, , , , and , in: Proceedings of Interspeech, Florence, Italy, pages 537-540, 2011 |
|
Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (2011)
Integrating articulatory features using Kullback-Leibler divergence based acoustic model for phoneme recognition, and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pages 5192 - 5195, 2011 |
[DOI] |
Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (2011)
Language dependent universal phoneme posterior estimation for mixed language speech recognition, , , and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Prag, CZ, pages 5012-5015, 2011 |
|
IEEE Intl. Conference on Acoustics, Speech and Signal Processing 2011 (2011)
Phoneme Recognition using Boosted Binary Features, , and , in: IEEE Intl. Conference on Acoustics, Speech and Signal Processing 2011, 2011 |
|
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (2011)
Posterior Features for Template-based ASR, , , and , in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Prague, Czech Republic, 2011 |
|
Proceedings of Interspeech, Japan (2010)
A Comparative Study of MLP Front-ends for Mandarin ASR, , , , and , in: Proceedings of Interspeech, Japan, 2010 |
|
2010 IEEE International Conference on Acoustics, Speech and Signal Processing (2010)
BOOSTED BINARY FEATURES FOR NOISE-ROBUST SPEAKER VERIFICATION, , and , in: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, Texas, 2010 |
|
ICASSP 2010 (2010)
Evaluating the Robustness of Privacy-Sensitive Audio Features for Speech Detection in Personal Audio Log Scenarios, , , and , in: ICASSP 2010, 2010 |
|
Proceedings of Interspeech (2010)
Hierarchical Multilayer Perceptron based Language Identification, , and , in: Proceedings of Interspeech, Makuhari, Japan, pages 2722-2725, 2010 |
|
Towards mixed language speech recognition systems, , and , in: Proceedings of Interspeech, Makuhari, Japan, pages 278-281, 2010 |
|
Proceedings of the 10thAnnual Conference of the International Speech Communication Association (Interspeech) (2009)
Hierarchical Processing of the Modulation Spectrum for GALE Mandarin LVCSR system, , , and , in: Proceedings of the 10thAnnual Conference of the International Speech Communication Association (Interspeech), Brighton, 2009 |
|
Proceedings of Interspeech 2009 (2009)
Investigating Privacy-Sensitive Features for Speech Detection in Multiparty Conversations, , , and , in: Proceedings of Interspeech 2009, 2009 |
|
Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding (2009)
MLP Based Hierarchical System for Task Adaptation in ASR, , and , in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, Merano, Italy, 2009 |
|
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009)
Non-linear mapping for multi-channel speech separation and robust overlapping speech recognition, , , and , in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009 |
|
Posterior features applied to speech recognition tasks with user-defined vocabulary, , and , in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009 |
|
Proceedings of ICMI-MLMI 2009 (2009)
Speaker Change Detection with Privacy-Preserving Audio Cues, , , and , in: Proceedings of ICMI-MLMI 2009, 2009 |
|
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2009)
Volterra Series for Analyzing MLP based Phoneme Posterior Probability Estimator, , , and , in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2009 |
|
"{IEEE} Int. Conf. on Acoustics, Speech, and Signal Processing ({ICASSP})" (2008)
Exploiting Contextual Information for Improved Phoneme Recognition, , , and , in: "IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)", 2008 |
|
3rd Joint Workshop on Multimodal Interaction and Related Machine LEarning Algorithms {MLMI'06} (2006)
Juicer: A Weighted Finite-State Transducer speech decoder, , , , , and , in: 3rd Joint Workshop on Multimodal Interaction and Related Machine LEarning Algorithms MLMI'06, 2006 |
|
Proceedings of {ICASSP} 2006 (2006)
Threshold Selection for Unsupervised Detection, with an Application to Microphone Arrays, , and , in: Proceedings of ICASSP 2006, 2006 |
|
Proceedings of {ICASSP} 2005 (2005)
A Sector-Based, Frequency-Domain Approach to Detection and Localization of Multiple Speakers, and , in: Proceedings of ICASSP 2005, 2005 |
|
Proceedings of {INTERSPEECH} 2005 (2005)
A Spectrogram Model for Enhanced Source Localization and Noise-Robust ASR, , and , in: Proceedings of INTERSPEECH 2005, 2005 |
|
Proceedings of the 2005 {IEEE} {ASRU} {W}orkshop (2005)
Unsupervised Spectral Subtraction for Noise-Robust ASR, , , and , in: Proceedings of the 2005 IEEE ASRU Workshop, 2005 |
|
Proceedings of ICASSP (2004)
Joint Decoding for Phoneme-Grapheme Continuous Speech Recognition, , and , in: Proceedings of ICASSP, 2004 |
|
Proceedings of ICSLP (2004)
Modelling Auxiliary Features in Tandem Systems, , , and , in: Proceedings of ICSLP, 2004 |
|
Proceedings of the INTERSPEECH-ICSLP-04 (2004)
Spectro-Temporal Activity Pattern (STAP) Features for Noise Robust ASR, , , and , in: Proceedings of the INTERSPEECH-ICSLP-04, 2004 |
|
Proceedings of IEEE ASRU (2003)
Phoneme-Grapheme Based Speech Recognition System, , , and , in: Proceedings of IEEE ASRU, 2003 |
|
Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-03) (2003)
Speech recognition of spontaneous, noisy speech using auxiliary information in Bayesian networks, , and , in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-03), 2003 |
Proceedings of Eurospeech (2003)
Using pitch frequency information in speech recognition, , and , in: Proceedings of Eurospeech, 2003 |
|