Keywords:
- Accent Identification
- Accented speech
- Accentual mismatch
- Acoustic model adaptation
- acoustic modeling
- Ad hoc microphone array calibration
- Ad-hoc microphone calibration
- adaptive budget allocation
- adaptive layer norm
- adaptive training
- adaptive TTS
- Adequacy of diffuseness
- Afrikaans
- All pass warp
- ASR
- audio processing
- aurora
- Automatic prosodic event detection
- Automatic Speech Recognition
- Bayesian transfer learning
- Benchmarking
- benchmarks
- bilingual speakers
- Blizzard Challenge
- Broadband beam-pattern
- broadcast news
- Cadzow algorithm
- catastrophic forgetting
- cepstral normalisation
- cochlear model
- cochlear models
- Code-Switching
- conditional layer normalization
- Confidence Measure (CM)
- connectionist temporal classification (ctc)
- constrained structural maximum a posteriori linear regression
- continuous F0 coding
- Conversational technologies
- crosslingual adaptation
- Deep learning for speech
- deep MLPs
- deep neural networks
- Delay-and-sum beamformer
- dialectal lexicon
- Diffuse field coherence model
- Diffuse noise coherence
- Diffuse sound coherence model
- diffusion model
- diffusion transformer
- Digital IIR Filters
- Digital IIR Filters
- direction of arrival
- Directivity
- Distant speech recognition
- Distributed source localization.
- dnn
- dnn-based speech recognition
- domain adaptation
- duration
- Emotion Recognition
- emotional speech synthesis
- emotional TTS
- emphasis
- end-to-end
- end-to-end architectures
- energy
- Environmental mismatch
- Euclidean distance matrix
- fast adaptation
- fast training
- filterbanks
- French accents
- French Regional Accents
- French TTS
- Fujisaki Model
- gamma-tone filter
- Generalized Trust Region Subproblem (GTRS).
- German
- German language
- GMM Modelling
- hidden Markov models
- HMM-based speech synthesis
- HSMM explicit duration modelling
- hybrid system
- i-vectors
- Image Model
- importance score
- intonation
- KL-HMM
- Kullback-Leibler divergence
- Laplace approximation
- Lexicon
- low bit rate speech coding
- low-rank adaptation
- LVCSR
- Matrix completion
- modelling
- multi-dialect
- multilayer perceptron
- Multilingual
- multilingual acoustic modeling
- multilingual ASR
- multilingual speech recognition
- Multimodal interaction
- nearest neighbour rule of classification.
- neural network features
- neural networks
- NLP
- Noise Robustness
- open vocabulary
- open-vocabulary
- Out-Of-Language (OOL) detection
- Overlapping Speech
- parameter-efficient fine-tuning
- parametric speech synthesis
- Parametric vocoding
- pattern matching
- phone duration modelling
- Phonological features
- phonological posteriors
- phonology
- pitch analysis
- pitch model
- Pitch modelling
- pitch target approximation
- pitch target realisation
- Posterior features
- pretrained language model
- probabilistic amplitude demodulation
- prosody
- Prosody Modelling
- punctuation
- real-time audio processing
- recurrent neural network
- reliability estimation
- Reverberant enclosure
- Robust microphone placement
- S-stress
- Saliency Mapping
- self-supervision
- Semi-supervised training
- Semidefinite programming
- sentence boundary prediction
- SGMM adaptation
- SincNet
- Single-channel source localization
- SNR spectrum
- Source localization
- Sparse Component Analysis
- speaker adaptation
- spectral amplitude modulation phase hierarchy
- speech coding
- speech corpus
- speech meta-data
- speech prosody
- speech recognition
- speech synthesis
- Speech Translation
- speech-to-speech translation
- spiking neural networks
- Spoken Language Understanding
- Spoken Term Detection (STD)
- Statistical parametric speech synthesis
- Subs-ace Gaussian Mixture Models
- subword segmentation
- Subword unit
- Superdirective beamformer
- Support Vector Regression
- SVM
- Swiss German
- Swiss prosody
- Swisscom
- synchronisation
- Tandem
- temporal alignment
- text-to-speech
- time synchronisation
- time synchronization
- time-frequency analysis
- trainable filterbanks
- TTS
- TV Box
- Under-resourced data
- under-resourced languages
- under-resourced speech recognition
- unit selection
- universal phoneme set
- VAE
- variational inference
- Very low bit rate speech coding
- vocal tract length normalization
- voice assistant
- VTLN
- Wav2vec
- word emphasis
- zero-shot speaker adaptation
Publications of Philip N. Garner sorted by recency
SVR vs MLP for Phone Duration Modelling in HMM-based Speech Synthesis, , and , Idiap-RR-03-2014 |
|
Prosody in Swiss French Accents: Investigation using Analysis by Synthesis, , , and , Idiap-RR-04-2014 |
|
Evaluating Intra- and Crosslingual Adaptation for Non-native Speech Recognition in a Bilingual Environment, and , in: Proceedings of the 4th IEEE International Conference on Cognitive Infocommunications, IEEE, Budapest, Hungary, pages 357-361, 2013 |
|
Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation, , and , in: Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), ISCA - International Speech Communication Association, Lyon, France, pages 510-514, ISCA, 2013 |
|
Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation, , and , Idiap-RR-39-2013 |
|
ACCENT ADAPTATION USING SUBSPACE GAUSSIAN MIXTURE MODELS, , , and , in: The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, BC, Canada, pages 7170-7174, 2013 |
[DOI] |
ACCENT ADAPTATION USING SUBSPACE GAUSSIAN MIXTURE MODELS, , , and , Idiap-RR-38-2013 |
|
Combining Vocal Tract Length Normalization with Hierarchical Linear Transformations, , , and , in: IEEE Journal of Selected Topics in Signal Processing - Special Issue on Statistical Parametric Speech Synthesis, 8(2):262 - 272, 2014 |
[DOI] |
Impact of deep MLP architecture on different acoustic modeling techniques for under-resourced speech recognition, , , and , in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, 2013 |
|
Euclidean Distance Matrix Completion for Ad-hoc Microphone Array Calibration, , , and , in: Proceedings IEEE International Conference On Digital Signal Processing, 2013 |
|
Automatic Speech Indexing System of Bilingual Video Parliament Interventions, , , , , and , Idiap-RR-25-2013 |
|
Syllable-based Pitch Encoding for Low Bit Rate Speech Coding with Recognition/Synthesis Architecture, , and , Idiap-RR-24-2013 |
|
Syllable-based Pitch Encoding for Low Bit Rate Speech Coding with Recognition/Synthesis Architecture, , and , in: Proc. of Interspeech 2013, Lyon, France, 2013 |
|
Applying multi- and cross-lingual stochastic phone space transformations to non-native speech recognition, , , , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 2013 |
[DOI] |
Statistical models for HMM/ANN hybrids, and , Idiap-RR-11-2013 |
|
Bias Adaptation for Vocal Tract Length Normalization, , , and , Idiap-RR-12-2013 |
|
On the (Un)importance of the Contextual Factors In HMM-Based Speech Synthesis, , and , in: Proceedings of the IEEE Intl. Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, pages 8140 - 8143, 2013 |
|
Convolutional Pitch Target Approximation Model for Speech Synthesis, and , Idiap-RR-05-2013 |
|
Using out-of-language data to improve an under-resourced speech recognizer, , , and , Idiap-RR-09-2013 |
|
Using out-of-language data to improve an under-resourced speech recognizer, , , and , in: Speech Communication, 2013 |
[DOI] [URL] |
A Simple Continuous Pitch Estimation Algorithm, , and , in: IEEE Signal Processing Letters, 20(1):102--105, 2013 |
[URL] |
ON THE (UN)IMPORTANCE OF THE CONTEXTUAL FACTORS IN HMM-BASED SPEECH SYNTHESIS AND CODING, , and , Idiap-RR-06-2013 |
|
COMBINING CEPSTRAL NORMALIZATION AND COCHLEAR IMPLANT-LIKE SPEECH PROCESSING FOR MICROPHONE ARRAY-BASED SPEECH RECOGNITION, , and , in: Proceedings of the IEEE Workshop on Spoken Language Technology, 2012 |
|
MediaParl: Bilingual mixed language accented speech database, , , , , and , Idiap-RR-03-2013 |
|
MediaParl: Bilingual mixed language accented speech database, , , , , and , in: Proceedings of the 2012 IEEE Workshop on Spoken Language Technology, pages 263--268, 2012 |
|
From Research to Reality: Evaluation of a Single-Computer Real-Time LVCSR System for Speech-Based Retrieval, , , and , Idiap-RR-12-2017 |
|
Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios, , , and , Idiap-RR-20-2012 |
|
Microphone Array Beampattern Characterization for Hands-free Speech Applications, , and , in: IEEE 7th Sensor Array and Multichannel Signal Processing Workshop(SAM), Hoboken, NJ, USA, pages 473-476, 2012 |
|
Comparing different acoustic modeling techniques for multilingual boosting, , , , and , Idiap-RR-01-2013 |
|
Comparing different acoustic modeling techniques for multilingual boosting, , , , and , in: Proceedings of Interspeech, Portland, Oregon, 2012 |
|
Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis, , and , in: IEEE Transactions on Audio, Speech and Language Processing, 2012 |
|
COMBINING VOCAL TRACT LENGTH NORMALIZATION WITH HIERARCHIAL LINEAR TRANSFORMATIONS, , , and , in: Proceedings in International conference on Speech and Signal processing, Kyoto, Japan, pages 4493-4496, IEEE SPS (ICASSP), 2012 |
|
Boosting under-resourced speech recognizers by exploiting out of language data - Case study on Afrikaans, , and , Idiap-RR-15-2012 |
|
Bayesian Approaches to Uncertainty in Speech Processing, , School of Computing Sciences, University of East Anglia, 2011 |
|
Boosting under-resourced speech recognizers by exploiting out of language data - Case study on Afrikaans, , and , in: Proceedings of the 3rd International Workshop on Spoken Languages Technologies for Under-resourced Languages, Cape Town, pages 60--67, 2012 |
|
Progress report of a project in very low bit-rate speech coding, , and , Idiap-RR-08-2012 |
|
Using KL-divergence and multilingual information to improve ASR for under-resourced languages, , and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, pages 4869--4872, 2012 |
|
BROADBAND BEAMPATTERN FOR MULTI-CHANNEL SPEECH ACQUISITION AND DISTANT SPEECH RECOGNITION, , and , Idiap-RR-39-2011 |
|
Current trends in multilingual speech processing, , , , , , , , and , in: Sadhana, 36(5):885–915, 2011 |
[DOI] [URL] |
Transcribing meetings with the AMIDA systems, , , , , , , , , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 20(2):486--498, 2012 |
[DOI] [URL] |